Re: Error in pg_restore (could not close data file: Success)

2020-10-20 Thread Kyotaro Horiguchi
At Tue, 20 Oct 2020 13:48:25 +0300, Andrii Tkach  wrote in 
> Hello,
> After restoring 'directory' backup with pg_restore (PostgreSQL 10.6) I've
> got a message:
> pg_restore: [directory archiver] could not close data file: Success
> pg_restore: [parallel archiver] a worker process died unexpectedly
> In this thread:
> https://www.postgresql.org/message-id/CAFcNs%2Bos5ExGvXMBrBBzzuJJamoHt5-zdJdxX39nkVG0KUxwsw%40mail.gmail.com
> there is only one answer. I'm interesting, is it normal behaivor of
> pg_restore and backup restored normaly or not ?

That would be a broken compressed file, maybe caused by disk full.

This reminded me of a thread. The issue above seems to be the same
with this:

https://www.postgresql.org/message-id/flat/20200416.181945.759179589924840062.horikyota.ntt%40gmail.com#ed85c5fda64873c45811be4e3027a2ea

Me> Hmm. Sounds reasonable.  I'm going to do that.  Thanks!

But somehow that haven't happened, I'll come up with a new version.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add statistics to pg_stat_wal view for wal related parameter tuning

2020-10-20 Thread Amit Kapila
On Tue, Oct 20, 2020 at 12:41 PM Masahiro Ikeda
 wrote:
>
> On 2020-10-20 12:46, Amit Kapila wrote:
> > On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda
> >> 1. Basic statistics of WAL activity
> >>
> >> - wal_records: Total number of WAL records generated
> >> - wal_fpi: Total number of WAL full page images generated
> >> - wal_bytes: Total amount of WAL bytes generated
> >>
> >> To understand DB's performance, first, we will check the performance
> >> trends for the entire database instance.
> >> For example, if the number of wal_fpi becomes higher, users may tune
> >> "wal_compression", "checkpoint_timeout" and so on.
> >>
> >> Although users can check the above statistics via EXPLAIN,
> >> auto_explain,
> >> autovacuum and pg_stat_statements now,
> >> if users want to see the performance trends  for the entire database,
> >> they must recalculate the statistics.
> >>
> >
> > Here, do you mean to say 'entire cluster' instead of 'entire database'
> > because it seems these stats are getting collected for the entire
> > cluster?
>
> Thanks for your comments.
> Yes, I wanted to say 'entire cluster'.
>
> >> I think it is useful to add the sum of the basic statistics.
> >>
> >
> > There is an argument that it is better to view these stats at the
> > statement-level so that one can know which statements are causing most
> > WAL and then try to rate-limit them if required in the application and
> > anyway they can get the aggregate of all the WAL if they want. We have
> > added these stats in PG-13, so do we have any evidence that the
> > already added stats don't provide enough information? I understand
> > that you are trying to display the accumulated stats here which if
> > required users/DBA need to compute with the currently provided stats.
> > OTOH, sometimes adding more ways to do some things causes difficulty
> > for users to understand and learn.
>
> I agreed that the statement-level stat is important and I understood
> that we can
> know the aggregated WAL stats of pg_stat_statement view and autovacuum's
> log.
> But now, WAL stats generated by autovacuum can be output to logs and it
> is not
> easy to aggregate them. Since WAL writes impacts for the entire cluster,
> I thought
> it's natural to provide accumulated value.
>

I think it is other way i.e if we would have accumulated stats then it
makes sense to provide those at statement-level because one would like
to know the exact cause of more WAL activity. Say it is due to an
autovacuum or due to the particular set of statements then it would
easier for users to do something about it.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Abhijit Menon-Sen
At 2020-10-20 10:53:04 -0700, and...@anarazel.de wrote:
>
> > postgres=# ALTER SYSTEM SET max_worker_processes += 4;
> > ALTER SYSTEM
> 
> Much less clear that this is a good idea...

I agree it's less clear. I still think it might be useful in some cases
(such as the example with max_worker_processes quoted above), but it's
not as compelling as altering search_path/shared_preload_libraries.

(That's partly why I posted it as a separate patch.)

> It seems to me that appending and incrementing using the same syntax
> is a) confusing b) will be a limitation before long.

I understand (a), but what sort of limitation do you foresee in (b)?

Do you think both features should be implemented, but with a different
syntax, or are you saying incrementing should not be implemented now?

> > These patches do not affect configuration file parsing in any way:
> > its use is limited to "SET" and "ALTER xxx SET".
> 
> Are you including user / database settings as part of ALTER ... SET?
> Or just SYSTEM?

Yes, it works the same for all of the ALTER … SET variants, including
users and databases.

-- Abhijit




Re: parallel distinct union and aggregate support patch

2020-10-20 Thread Thomas Munro
On Tue, Oct 20, 2020 at 3:49 AM bu...@sohu.com  wrote:
> I write a path for soupport parallel distinct, union and aggregate using 
> batch sort.
> steps:
>  1. generate hash value for group clauses values, and using mod hash value 
> save to batch
>  2. end of outer plan, wait all other workers finish write to batch
>  3. echo worker get a unique batch number, call tuplesort_performsort() 
> function finish this batch sort
>  4. return row for this batch
>  5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so 
> Unique(or GroupAggregate) plan can work.

Hi!

Interesting work!  In the past a few people have speculated about a
Parallel Repartition operator that could partition tuples a bit like
this, so that each process gets a different set of partitions.  Here
you combine that with a sort.  By doing both things in one node, you
avoid a lot of overheads (writing into a tuplestore once in the
repartitioning node, and then once again in the sort node, with tuples
being copied one-by-one between the two nodes).

If I understood correctly, the tuples emitted by Parallel Batch Sort
in each process are ordered by (hash(key, ...) % npartitions, key,
...), but the path is claiming to be ordered by (key, ...), no?
That's enough for Unique and Aggregate to give the correct answer,
because they really only require equal keys to be consecutive (and in
the same process), but maybe some other plan could break?




Re: Track statistics for streaming of in-progress transactions

2020-10-20 Thread Amit Kapila
On Wed, Oct 21, 2020 at 8:15 AM Masahiko Sawada
 wrote:
>
> On Tue, 20 Oct 2020 at 14:29, Amit Kapila  wrote:
> >
> >
> > Thanks. One thing I have considered while updating this patch was to
> > write a test case similar to what we have for spilled stats in
> > test_decoding/sql/stats.sql but I decided not to do it as that doesn't
> > seem to add much value for the streaming case because we already have
> > some tests in test_decoding/sql/stream.sql which indicates that the
> > streaming is happening. If we could have a way to get the exact
> > streaming stats then it would have been better but while writing tests
> > for spilled stats we found that it is not possible because some
> > background transactions (like autovacuum) might send the stats earlier
> > making the actual number inconsistent. What do you think?
> >
> > Sawada-San, do you have any thoughts on this matter?
>
> I basically agree with that. Reading the patch, I have a question that
> might be relevant to this matter:
>
> The patch has the following code:
>
> +   /*
> +* Remember this information to be used later to update stats. We can't
> +* update the stats here as an error while processing the changes would
> +* lead to the accumulation of stats even though we haven't streamed all
> +* the changes.
> +*/
> +   txn_is_streamed = rbtxn_is_streamed(txn);
> +   stream_bytes = txn->total_size;
>
> The commend seems to mention only about when an error happened while
> processing the changes but I wonder if the same is true for the
> aborted transaction. That is, if we catch an error due to concurrent
> transaction abort while processing the changes, we stop to stream the
> changes. But the patch accumulates the stats even in this case.
>

It would only add for the current stream and I don't think that is
wrong because we would have sent some data (at least the start
message) for which we send the stream_stop message later while
decoding Abort message. I had thought to avoid this we can update the
stats in ReorderBufferProcessTXN at the end when we know streaming is
complete but again that would miss the counter update for the data we
have sent before an error has occurred and also updating the streaming
counters in ReorderBufferStreamTXN seems more logical to me.

> If we
> don’t want to accumulate the stats of the abort transaction and it’s
> easily reproducible, it might be better to add a test checking if we
> don’t accumulate in that case.
>

But as explained above, I think we count it as we would have sent at
least one message (could be more) before we encounter this error.


-- 
With Regards,
Amit Kapila.




Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-20 Thread Amit Kapila
On Tue, Oct 13, 2020 at 10:33 AM Amit Kapila  wrote:
>
> On Tue, Oct 13, 2020 at 10:21 AM Tom Lane  wrote:
> >
> >
> > I know I can go read the source code, but most users will not want to.
> > Is the documentation in monitoring.sgml really sufficient?  If we can't
> > explain this with more precision, is it really a number we want to expose
> > at all?
> >
>
> This counter is important to give users an idea about the amount of
> I/O we incur during decoding and to tune logical_decoding_work_mem
> GUC. So, I would prefer to improve the documentation for this
> variable.
>

I have modified the description of spill_count and spill_txns to make
things clear. Any suggestions?

-- 
With Regards,
Amit Kapila.


v1-0001-Update-description-of-spilled-counters-in-pg_stat.patch
Description: Binary data


Re: Track statistics for streaming of in-progress transactions

2020-10-20 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 14:29, Amit Kapila  wrote:
>
> On Mon, Oct 19, 2020 at 1:52 PM Ajin Cherian  wrote:
> >
> > On Wed, Oct 14, 2020 at 2:39 PM Amit Kapila  wrote:
> > >
> > > Commit 464824323e has added the support of the streaming of
> > > in-progress transactions into the built-in logical replication. The
> > > attached patch adds the statistics about transactions streamed to the
> > > decoding output plugin from ReorderBuffer. Users can query the
> > > pg_stat_replication_slots view to check these stats and call
> > > pg_stat_reset_replication_slot to reset the stats of a particular
> > > slot. Users can pass NULL in pg_stat_reset_replication_slot to reset
> > > stats of all the slots.
> > >
> > > Commit 9868167500 has added the basic infrastructure to capture the
> > > stats of slot and this commit extends the statistics collector to
> > > track additional information about slots.
> > >
> > > This patch was originally written by Ajin Cherian [1]. I have fixed
> > > bugs and modified some comments in the code.
> > >
> > > Thoughts?
> > >
> > > [1] - 
> > > https://www.postgresql.org/message-id/CAFPTHDZ8RnOovefzB%2BOMoRxLSD404WRLqWBUHe6bWqM5ew1bNA%40mail.gmail.com
> >
> > I've applied the patch. It applies cleanly. I've reviewed the patch
> > and have no comments to report.
> > I have also run some tests to get streaming stats as well as reset the
> > stats counter, everything seems to be working as expected.
> > I am fine with the changes.
> >
>
> Thanks. One thing I have considered while updating this patch was to
> write a test case similar to what we have for spilled stats in
> test_decoding/sql/stats.sql but I decided not to do it as that doesn't
> seem to add much value for the streaming case because we already have
> some tests in test_decoding/sql/stream.sql which indicates that the
> streaming is happening. If we could have a way to get the exact
> streaming stats then it would have been better but while writing tests
> for spilled stats we found that it is not possible because some
> background transactions (like autovacuum) might send the stats earlier
> making the actual number inconsistent. What do you think?
>
> Sawada-San, do you have any thoughts on this matter?

I basically agree with that. Reading the patch, I have a question that
might be relevant to this matter:

The patch has the following code:

+   /*
+* Remember this information to be used later to update stats. We can't
+* update the stats here as an error while processing the changes would
+* lead to the accumulation of stats even though we haven't streamed all
+* the changes.
+*/
+   txn_is_streamed = rbtxn_is_streamed(txn);
+   stream_bytes = txn->total_size;

The commend seems to mention only about when an error happened while
processing the changes but I wonder if the same is true for the
aborted transaction. That is, if we catch an error due to concurrent
transaction abort while processing the changes, we stop to stream the
changes. But the patch accumulates the stats even in this case. If we
don’t want to accumulate the stats of the abort transaction and it’s
easily reproducible, it might be better to add a test checking if we
don’t accumulate in that case.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Reduce the dependence on access/xlog_internal.h

2020-10-20 Thread Mark Dilger


> On Oct 19, 2020, at 7:05 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2020-10-19 18:29:27 -0700, Mark Dilger wrote:
>> Please find access/xlog_internal.h refactored in the attached patch
>> series.  This header is included from many places, including external
>> tools.  It is aesthetically displeasing to have something called
>> "internal" used from so many places, especially when many of those
>> places do not deal directly with the internal workings of xlog.  But
>> it is even worse that multiple files include this header for no
>> reason.
> 
> 
>> 0002 - Moves RmgrData from access/xlog_internal.h into a new file 
>> access/rmgr_internal.h.  I clearly did not waste time thinking of a clever 
>> file name.  Bikeshedding welcome.  Most files which currently include 
>> xlog_internal.h do not need the definition of RmgrData.  As it stands now, 
>> inclusion of xlog_internal.h indirectly includes the following headers:
>> 
>> After refactoring, the inclusion of xlog_internal.h includes indirectly only 
>> these headers:
>> 
>> and only these files need to be altered to include the new rmgr_internal.h 
>> header:
>> 
>>src/backend/access/transam/rmgr.c
>>src/backend/access/transam/xlog.c
>>src/backend/utils/misc/guc.c
>> 
>> Thoughts?
> 
> It's not clear why the correct direction here is to make
> xlog_internals.h less "low level" by moving things into headers like
> rmgr_internal.h, rather than moving the widely used parts of
> xlog_internal.h elsewhere.

Thanks for reviewing!

>> A small portion of access/xlog_internal.h defines the RmgrData struct,
>> and in support of this struct the header includes a number of other
>> headers.  Files that include access/xlog_internal.h indirectly include
>> these other headers, which most do not need. (Only 3 out of 41 files
>> involved actually need that portion of the header.)  For third-party
>> tools which deal with backup, restore, or replication matters,
>> including xlog_internal.h is necessary to get macros for calculating
>> xlog file names, but doing so also indirectly pulls in other headers,
>> increasing the risk of unwanted symbol collisions.  Some colleagues
>> and I ran into this exact problem in a C++ program that uses both
>> xlog_internal.h and the Boost C++ library.
> 
> It seems better to me to just use forward declarations for StringInfo
> and XLogReaderState (and just generally use them mroe aggressively). We
> don't need the functions for dealing with those datatypes here.

Yeah, those are good points.  Please find attached version 2 of the patch set 
which preserves the cleanup of the first version's 0001 patch, and introduces 
two new patches, 0002 and 0003:

0002 - Moves commonly used stuff from xlog_internal.h into other headers

0003 - Uses forward declarations for StringInfo and XLogReaderState so as to 
not need to include lib/stringinfo.h nor access/xlogreader.h from 
access/xlog_internal.h



v2-0001-Removing-gratuitous-inclusion-of-xlog_internal.h.patch
Description: Binary data


v2-0002-Refactoring-access-xlog_internal.h.patch
Description: Binary data


v2-0003-Using-forward-declarations-to-reduce-including-he.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Robert Haas
On Tue, Oct 20, 2020 at 3:24 PM Andres Freund  wrote:
> As far as I understand what the proposal does, if you were to do do an
> ALTER SYSTEM like you do, it'd actually write an "absolute"
> shared_preload_libraries value to postgresql.auto.conf. So if you then
> subsequently modified the shared_preload_libraries in postgresql.conf
> it'd be overwritten by the postgresql.auto.conf value, not "newly"
> appended to.

Right. So, it only really works if you either use ALTER SYSTEM for
everything or manual config file edits for everything. But, that can
still be useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: [Patch] Optimize dropping of relation buffers using dlist

2020-10-20 Thread tsunakawa.ta...@fujitsu.com
From: tsunakawa.ta...@fujitsu.com 
> Can you measure the time DropRelFileNodeBuffers()?  You can call
> GetTimestamp() at the beginning and end of the function, and use
> TimestampDifference() to calculate the difference.  Then, for instance,
> elog(WARNING, "time is | %u.%u", sec, usec) at the end of the function.  You
> can use any elog() print format for your convenience to write shell commands 
> to
> filter the lines and sum up the total.

Before doing this, you can also do "VACUUM (truncate off)" to see which of the 
garbage collection or relation truncation takes long time.  The relation 
truncation processing includes not only DropRelFileNodeBuffers() but also file 
truncation and something else, but it's an easy filter.


Regards
Takayuki Tsunakawa





Re: pg_upgrade: fail early if a tablespace dir already exists for new cluster version

2020-10-20 Thread Tom Lane
Justin Pryzby  writes:
> I wonder if pg_upgrade should try to rmdir() the tablespace dirs before
> restoring global objects, allowing it to succeed, rather than just "failing
> early".

I'm a little confused about that.  If the directories aren't empty,
that will fail, but if they are, shouldn't the upgrade just work?
initdb is not normally unhappy about the target directory existing
if it's empty.

The reason why rmdir-and-recreate is not a great substitute for
"it just works" is that (a) you may lose the intended ownership or
permissions of those dirs, (b) you may lack write permission on their
parent dirs.  So cases where the DBA has pre-created the directories
are important to support.

regards, tom lane




RE: Resetting spilled txn statistics in pg_stat_replication

2020-10-20 Thread Shinoda, Noriyoshi (PN Japan A Delivery)
> I have pushed the patch after updating the test_decoding/sql/stats.
> The corresponding changes in the test were missing.

Thank you very much for your help!

Regards,
Noriyoshi Shinoda

-Original Message-
From: Masahiko Sawada [mailto:masahiko.saw...@2ndquadrant.com] 
Sent: Tuesday, October 20, 2020 9:24 PM
To: Amit Kapila 
Cc: Shinoda, Noriyoshi (PN Japan A Delivery) ; 
Dilip Kumar ; Magnus Hagander ; 
Tomas Vondra ; PostgreSQL Hackers 
; Ajin Cherian 
Subject: Re: Resetting spilled txn statistics in pg_stat_replication

On Tue, 20 Oct 2020 at 20:11, Amit Kapila  wrote:
>
> On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada 
>  wrote:
> >
> > On Mon, 19 Oct 2020 at 14:24, Amit Kapila  wrote:
> > >
> >
> > > So, let's
> > > stick with 'text' data type for slot_name which means we should 
> > > go-ahead with the v3 version of the patch [1] posted by 
> > > Shinoda-San, right?
> >
> > Right. +1
> >
>
> I have pushed the patch after updating the test_decoding/sql/stats.
> The corresponding changes in the test were missing.

Thank you!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Alvaro Herrera
On 2020-Oct-20, Justin Pryzby wrote:

> On Tue, Oct 20, 2020 at 03:56:30PM -0400, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Hmm, next question: should we backpatch a fix for this?  (This applies
> > > all the way back to 11.)  If we do, then we would change behavior of
> > > partition creation.  It's hard to see that the current behavior is
> > > desirable ... and I think anybody who would have come across this, would
> > > wish it behaved the other way.  But still -- it would definitely be a
> > > behavior change.
> > 
> > The behavior change seems like it'd be an improvement in a vacuum,
> > but I wonder how it would interact with catalog contents left behind
> > by the old misbehavior.  Also, would we expect pg_dump to try to do
> > anything to clean up the mess?  If so, allowing a back branch to have
> > had more than one behavior would complicate that greatly.
> 
> I don't think there's a problem with catalog content ?
> I think it's fine if there's an enabled child trigger inheriting from a
> disabled parent?  This changes the initial tgenabled for new partitions.

I don't think we'd need to do anything special here ... particularly
considering the discovery that pg_dump does not preserve the disable
status of trigger on partitions:

> However...it looks like pg_dump should ALTER the child trigger state if it
> differ from its parent.  Or maybe it needs to CREATE child triggers with the
> proper state before attaching the child table ?

I guess *something* needs to be done, but I'm not clear on what it is.
Creating the trigger on partition beforehand does not work: an error is
raised on attach that the trigger already exists.

The only way I see to do this, is to have pg_dump extract tgenabled for
all child triggers that doesn't have the same tgenabled as the parent,
and append ALTER .. DISABLE commands for each one to the parent table
trigger creation command.  So pg_dump.c's getTriggers would have to
obtain the list of altered child triggers, and then dumpTrigger would
have to append the ALTER TABLE ONLY  .. ENABLE/DISABLE
command for that particular trigger.





Re: ECPG gets embedded quotes wrong

2020-10-20 Thread Tom Lane
I wrote:
> It looks to me like a sufficient fix is just to keep these quote
> sequences as-is within a converted string, so that the attached
> appears to fix it.

Poking at this further, I noticed that there's a semi-related bug
that this patch changes the behavior for, without fixing it exactly.
That has to do with use of a string literal as "execstring" in ECPG's
PREPARE ... FROM and EXECUTE IMMEDIATE commands.  Right now, it
appears that there is simply no way to write a double quote as part
of the SQL command in this context.  The EXECUTE IMMEDIATE docs say
that such a literal is a "C string", so one would figure that \"
(backslash-double quote) is the way, but that just produces syntax
errors.  The reason is that ECPG's lexer is in SQL mode at this point
so it thinks the double-quoted string is a SQL quoted identifier, in
which backslash isn't special so the double quote terminates the
identifier.  Ooops.  Knowing this, you might try writing two double
quotes, but that doesn't work either, because the {xddouble}
lexer rule converts that to one double quote, and you end up with
an unterminated literal in the translated C code rather than in the
ECPG input.

My patch above modifies this to the extent that two double quotes
come out as two double quotes in the translated C code, but that
just results in nothing at all, since the C compiler sees adjacent
string literals, which the C standard commands it to concatenate.
Then you probably get a mysterious syntax error from the backend
because it thinks your intended-to-be SQL quoted identifier isn't
quoted.  However, this is the behavior a C programmer would expect
for adjacent double quotes in a literal, so maybe people wouldn't
see it as mysterious.

Anyway, what to do?

1. Nothing, except document that you can't put a double quote into
the C string literal in these commands.

2. Make two-double-quotes work to produce a data double quote,
which I think could be done fairly easily with some post-processing
in the execstring production.  However, this doesn't have much to
recommend it other than being easily implementable.  C programmers
would not think it's natural, and the fact that backslash sequences
other than \" would work as a C programmer expects doesn't help.

3. Find a way to lex the literal per C rules, as the EXECUTE IMMEDIATE
docs clearly imply we should.  (The PREPARE docs are silent on the
point AFAICS.)  Unfortunately, this seems darn near impossible unless
we want to make IMMEDIATE (more) reserved.  Since it's currently
unreserved, the grammar can't tell which flavor of EXEC SQL EXECUTE ...
it's dealing with until it looks ahead past the name-or-IMMEDIATE token,
so that it must lex the literal (if any) too soon.  I tried putting in a
mid-rule action to switch the lexer back to C mode but failed because of
that ambiguity.  Maybe we could make it work with a bunch of refactoring,
but it would be ugly and subtle code, in both the grammar and lexer.

On the whole I'm inclined to go with #1.  There's a reason why nobody has
complained about this in twenty years, which is that the syntaxes with
a string literal are completely useless.  There's no point in writing
EXEC SQL EXECUTE IMMEDIATE "SQL-statement" when you can just write
EXEC SQL SQL-statement, and similarly for PREPARE.  (The other variant
that takes the string from a C variable is useful, but that one doesn't
have any weird quoting problem.)  So I can't see expending the effort
for #3, and I don't feel like adding and documenting the wart of #2.

Thoughts?

regards, tom lane




Re: speed up unicode decomposition and recomposition

2020-10-20 Thread Michael Paquier
On Tue, Oct 20, 2020 at 08:03:12AM -0400, John Naylor wrote:
> I've confirmed that. How about a new header unicode_norm_hashfunc.h which
> would include unicode_norm_table.h at the top. In unicode.c, we can include
> one of these depending on frontend or backend.

Sounds good to me.  Looking at the code, I would just generate the
second file within generate-unicode_norm_table.pl.
--
Michael


signature.asc
Description: PGP signature


Re: speed up unicode normalization quick check

2020-10-20 Thread Michael Paquier
On Tue, Oct 20, 2020 at 05:33:43AM -0400, John Naylor wrote:
> This is cleaner, so I'm good with this.

Thanks.  Applied this way, then.
--
Michael


signature.asc
Description: PGP signature


Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Dave Cramer
On Tue, 20 Oct 2020 at 20:09, Andres Freund  wrote:

> Hi,
>
> On 2020-10-20 18:55:41 -0500, Jack Christensen wrote:
> > Upthread someone posted a page pgjdbc detailing desired changes to the
> > backend protocol (
> >
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
> ).
>
> A lot of the stuff on there seems way beyond what can be achieved in
> something incrementally added to the protocol. Fair enough in an article
> about "v4" of the protocol. But I don't think we are - nor should we be
> - talking about a full new protocol version here. Instead we are talking
> about extending the protocol, where the extensions are opt-in.
>

You are correct we are not talking about a whole new protocol, but why not ?
Seems to me we would have a lot more latitude to get it right if we didn't
have this limitation.

Dave

>
>


Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Andres Freund
Hi,

On 2020-10-20 18:55:41 -0500, Jack Christensen wrote:
> Upthread someone posted a page pgjdbc detailing desired changes to the
> backend protocol (
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md).

A lot of the stuff on there seems way beyond what can be achieved in
something incrementally added to the protocol. Fair enough in an article
about "v4" of the protocol. But I don't think we are - nor should we be
- talking about a full new protocol version here. Instead we are talking
about extending the protocol, where the extensions are opt-in.

Greetings,

Andres Freund




Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Jack Christensen
Regarding decoding binary vs text performance: There can be a significant
performance cost to fetching the binary format over the text format for
types such as text. See
https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com
for the previous discussion.

>From the pgx driver (https://github.com/jackc/pgx) perspective:

A "use binary for these types" message sent once at the beginning of the
session would not only be helpful for dynamic result sets but could
simplify use of the extended protocol in general.

Upthread someone posted a page pgjdbc detailing desired changes to the
backend protocol (
https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md).
I concur with almost everything there, but in particular the first
suggestion of the backend automatically converting binary values like it
does text values would be huge. That combined with the "use binary for
these types" message could greatly simplify the driver side work in using
the binary format.

CommandComplete vs ReadyForQuery -- pgx does the same as Npgsql in that it
bundles batches multiple queries together in the extended protocol and uses
CommandComplete for statement completion and ReadyForQuery for batch
completion.



On Tue, Oct 20, 2020 at 9:28 AM Shay Rojansky  wrote:

> Very interesting conversation, thanks for including me Dave. Here are some
> thoughts from the Npgsql perspective,
>
> Re the binary vs. text discussion... A long time ago, Npgsql became a
> "binary-only" driver, meaning that it never sends or receives values in
> text encoding, and practically always uses the extended protocol. This was
> because in most (all?) cases, encoding/decoding binary is more efficient,
> and maintaining two encoders/decoders (one for text, one for binary) made
> less and less sense. So by default, Npgsql just requests "all binary" in
> all Bind messages it sends (there's an API for the user to request text, in
> which case they get pure strings which they're responsible for parsing).
> Binary handling is implemented for almost all PG types which support it,
> and I've hardly seen any complaints about this for the last few years. I'd
> be interested in any arguments against this decision (Dave, when have you
> seen that decoding binary is slower than decoding text?).
>
> Given the above, allowing the client to specify in advance which types
> should be in binary sounds good, but wouldn't help Npgsql much (since by
> default it already requests binary for everything). It would slightly help
> in allowing binary-unsupported types to automatically come back as text
> without manual user API calls, but as I wrote above this is an extremely
> rare scenario that people don't care much about.
>
> > Is there really a good reason for forcing the client to issue
> NextResult, Describe, Execute for each of the dynamic result sets?
>
> I very much agree - it should be possible to execute a procedure and
> consume all results in a single roundtrip, otherwise this is quite a perf
> killer.
>
> Peter, from your original message:
>
> > Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level command.
> ReadyForQuery means the whole command is complete. This is perhaps
> debatable, and interesting questions could also arise when considering what
> should happen in the simple query protocol when a query string consists of
> multiple commands each returning multiple result sets. But it doesn't
> really seem sensible to cater to that
>
> Npgsql implements batching of multiple statements via the extended
> protocol in a similar way. In other words, the .NET API allows users to
> pack multiple SQL statements and execute them in one roundtrip, and Npgsql
> does this by sending
> Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So
> CommandComplete signals completion of a single statement in the batch,
> whereas ReadyForQuery signals completion of the entire batch. This means
> that the "interesting questions" mentioned above are possibly relevant to
> the extended protocol as well.
>


Re: pg_upgrade: fail early if a tablespace dir already exists for new cluster version

2020-10-20 Thread Justin Pryzby
On Thu, Oct 15, 2020 at 07:35:30PM -0400, Bruce Momjian wrote:
> On Fri, Oct  9, 2020 at 07:42:51PM -0400, Bruce Momjian wrote:
> > On Fri, Oct  9, 2020 at 02:23:10PM -0500, Justin Pryzby wrote:
> > > In my local branch, I had revised this comment to say:
> > > 
> > > + * Note, v8.4 has no tablespace_suffix, which is fine so long as the 
> > > version we
> > > + * being upgraded *to* has a suffix, since it's not allowed to 
> > > pg_upgrade from
> > > + * a version to the same version if tablespaces are in use.
> > 
> > OK, updated patch attached.  Also, from your original patch, I didn't
> > need to call canonicalize_path() since we are not comparing paths, and I
> > didn't need to include common/relpath.h.  I also renamed a variable.
> 
> Patch applied to all supported versions.  Thanks for the report, and the

I wonder if pg_upgrade should try to rmdir() the tablespace dirs before
restoring global objects, allowing it to succeed, rather than just "failing
early".  That can avoid the need to re-create the new cluster, which I imagine
in some scenarios might be bad enough to require aborting the upgrade.
I propose only for master branch.

This doesn't avoid the need to re-create the new cluster if anything has been
restored into it (eg. if pg_tablespace is populated), it just cleans out any
empty dirs left behind by a previous pg_upgrade which failed after "restoring
global objects" but before copying/linking data into that tablespace.

|rm -fr pgsql12.dat pgsql14.dat tblspc;
|/usr/lib/postgresql/12/bin/initdb -D pgsql12.dat
|./tmp_install/usr/local/pgsql/bin/initdb -D pgsql14.dat
|mkdir tblspc tblspc/PG_14_202010201
|echo "CREATE TABLESPACE one LOCATION '`pwd`/tblspc'" 
|/usr/lib/postgresql/12/bin/postgres --single postgres -D pgsql12.dat -p 5678 
-k /tmp
|./tmp_install/usr/local/pgsql/bin/pg_upgrade -D pgsql14.dat/ -d pgsql12.dat -b 
/usr/lib/postgresql/12/bin

-- 
Justin
>From 2eb5d39bc38474f3fe956d0f3344c47257133afd Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 20 Oct 2020 17:30:54 -0500
Subject: [PATCH v3] Avoid failure if tablespace dirs exist and empty..

See also: 3c0471b5fdd847febb30ec805a3e44f3eac5b66f
---
 src/bin/pg_upgrade/check.c  | 23 ---
 src/bin/pg_upgrade/pg_upgrade.c | 13 +
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 05e6bf7f2c..57f9fa9ad6 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -531,9 +531,9 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 
 
 /*
- * A previous run of pg_upgrade might have failed and the new cluster
- * directory recreated, but they might have forgotten to remove
- * the new cluster's tablespace directories.  Therefore, check that
+ * A previous run of pg_upgrade might have failed.
+ * The admin might ahve recreated the new cluster directory, but forgotten to
+ * remove the new cluster's tablespace directories.  Therefore, check that
  * new cluster tablespace directories do not already exist.  If
  * they do, it would cause an error while restoring global objects.
  * This allows the failure to be detected at check time, rather than
@@ -554,15 +554,24 @@ check_for_new_tablespace_dir(ClusterInfo *new_cluster)
 
 	for (tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++)
 	{
-		struct stat statbuf;
-
 		snprintf(new_tablespace_dir, MAXPGPATH, "%s%s",
 os_info.old_tablespaces[tblnum],
 new_cluster->tablespace_suffix);
 
-		if (stat(new_tablespace_dir, ) == 0 || errno != ENOENT)
-			pg_fatal("new cluster tablespace directory already exists: \"%s\"\n",
+		/*
+		 * New tablepace dirs are expected to not exist; if they
+		 * do exist but are empty, we avoid failing here, and remove them
+		 * before creating tablespaces while "restoring global objects".
+		 */
+		switch (pg_check_dir(new_tablespace_dir))
+		{
+			case 0: /* Nonextant */
+			case 1: /* Empty */
+break;
+			default:
+pg_fatal("new cluster tablespace directory already exists and nonempty\"%s\"\n",
 	 new_tablespace_dir);
+		}
 	}
 
 	check_ok();
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 1bc86e4205..6b7f8a690a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -300,6 +300,19 @@ prepare_new_globals(void)
 	 */
 	set_frozenxids(false);
 
+	/*
+	 * Remove any *empty* tablespace dirs left behind by a previous, failed
+	 * upgrade.
+	 */
+	for (int tblnum = 0; tblnum < os_info.num_old_tablespaces; tblnum++)
+	{
+		char	new_tablespace_dir[MAXPGPATH];
+		snprintf(new_tablespace_dir, MAXPGPATH, "%s%s",
+os_info.old_tablespaces[tblnum],
+new_cluster.tablespace_suffix);
+		(void) rmdir(new_tablespace_dir);
+	}
+
 	/*
 	 * Now restore global objects (roles and tablespaces).
 	 */
-- 
2.17.0



Re: PostgresNode::backup uses spread checkpoint?

2020-10-20 Thread Michael Paquier
On Tue, Oct 20, 2020 at 11:13:34AM -0400, David Steele wrote:
> On 10/20/20 11:01 AM, Alvaro Herrera wrote:
>> I noticed a few days ago that method backup() in PostgresNode uses
>> pg_basebackup without specifying a checkpoint mode -- and the default is
>> a spread checkpoint, which may cause any tests that use that to take
>> slightly longer than the bare minimum.
>> 
>> I propose to make it use a fast checkpoint, as per the attached.
> 
> +1.

+1.

-   $self->host, '-p', $self->port, '--no-sync');
+   $self->host, '-p', $self->port, '-cfast', '--no-sync');

Some nits: I would recommend to use the long option name, and list
the option name and its value as two separate arguments of the
command.
--
Michael


signature.asc
Description: PGP signature


Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Alvaro Herrera
On 2020-Oct-20, Alvaro Herrera wrote:

> > diff --git a/src/backend/commands/tablecmds.c 
> > b/src/backend/commands/tablecmds.c
> > index 511f015a86..c8d6f78da2 100644
> > --- a/src/backend/commands/tablecmds.c
> > +++ b/src/backend/commands/tablecmds.c
> > @@ -4321,6 +4321,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd 
> > *cmd,
> > case AT_DisableTrigAll:
> > case AT_DisableTrigUser:
> > ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
> > +   ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, 
> > context);
> > pass = AT_PASS_MISC;
> > break;
> > case AT_EnableRule: /* ENABLE/DISABLE RULE variants 
> > */
> 
> I'll add tests for both cases and push to all branches 11+.

Pushed this part.




Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Justin Pryzby
On Tue, Oct 20, 2020 at 03:56:30PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > Hmm, next question: should we backpatch a fix for this?  (This applies
> > all the way back to 11.)  If we do, then we would change behavior of
> > partition creation.  It's hard to see that the current behavior is
> > desirable ... and I think anybody who would have come across this, would
> > wish it behaved the other way.  But still -- it would definitely be a
> > behavior change.
> 
> The behavior change seems like it'd be an improvement in a vacuum,
> but I wonder how it would interact with catalog contents left behind
> by the old misbehavior.  Also, would we expect pg_dump to try to do
> anything to clean up the mess?  If so, allowing a back branch to have
> had more than one behavior would complicate that greatly.

I don't think there's a problem with catalog content ?
I think it's fine if there's an enabled child trigger inheriting from a
disabled parent?  This changes the initial tgenabled for new partitions.

The old catalog state is still possible - it's what you'd get if you did
CREATE TABLE child PARTITION OF..; ALTER TABLE child DISABLE TRIGGER.

I don't think pg_dump needs to do anyting special, since the restore will now
do what's wanted without added commands.  Note that pg_dump switched from
"PARTITION OF" to "ATTACH PARTITION" at commit 33a53130a.  This would handles
both on the server side.

However...it looks like pg_dump should ALTER the child trigger state if it
differ from its parent.  Or maybe it needs to CREATE child triggers with the
proper state before attaching the child table ?

-- 
Justin




Re: Hash support for row types

2020-10-20 Thread Tom Lane
Robert Haas  writes:
> Do we need to worry about what happens if somebody modifies the
> opclass/opfamily definitions?

There's a lot of places that you can break by doing that.  I'm not
too concerned about it.

regards, tom lane




Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Alvaro Herrera
On 2020-Oct-16, Alvaro Herrera wrote:

> On 2020-Oct-16, Alvaro Herrera wrote:
> 
> > I also just noticed that ALTER TABLE ONLY recurses to children, which it
> > should not.
> 
> Apparently I wrote (bogus) bespoke code to handle recursion in
> EnableDisableTrigger instead of using ATSimpleRecursion.  This patch
> seems to fix this problem.

... but it affects legacy inheritance, which would be undesirable
because it has never recursed for that case.  So it needs to have a
relkind check here and only recurse if it's a new-style partitioned
table:

> diff --git a/src/backend/commands/tablecmds.c 
> b/src/backend/commands/tablecmds.c
> index 511f015a86..c8d6f78da2 100644
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -4321,6 +4321,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd 
> *cmd,
>   case AT_DisableTrigAll:
>   case AT_DisableTrigUser:
>   ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
> + ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, 
> context);
>   pass = AT_PASS_MISC;
>   break;
>   case AT_EnableRule: /* ENABLE/DISABLE RULE variants 
> */

I'll add tests for both cases and push to all branches 11+.




Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm, next question: should we backpatch a fix for this?  (This applies
> all the way back to 11.)  If we do, then we would change behavior of
> partition creation.  It's hard to see that the current behavior is
> desirable ... and I think anybody who would have come across this, would
> wish it behaved the other way.  But still -- it would definitely be a
> behavior change.

The behavior change seems like it'd be an improvement in a vacuum,
but I wonder how it would interact with catalog contents left behind
by the old misbehavior.  Also, would we expect pg_dump to try to do
anything to clean up the mess?  If so, allowing a back branch to have
had more than one behavior would complicate that greatly.

regards, tom lane




ECPG gets embedded quotes wrong

2020-10-20 Thread Tom Lane
A recent user complaint [1] led me to investigate what ECPG does with
embedded quotes (that is, quotes-meant-to-be-data) in SQL identifiers
and strings.  AFAICS, it gets it wrong.  For example, if you write
the literal 'abc''def' in an EXEC SQL command, that will come out the
other end as 'abc'def', triggering a syntax error in the backend.
Likewise, "abc""def" is reduced to "abc"def" which is wrong syntax.

It looks to me like a sufficient fix is just to keep these quote
sequences as-is within a converted string, so that the attached
appears to fix it.  I added some documentation too, since there
doesn't seem to be anything there now explaining how it's supposed
to work.

I doubt this is safely back-patchable, since anybody who's working
around the existing misbehavior (as I see sql/dyntest.pgc is doing)
would not appreciate it changing under them in a minor release.
But I think we can fix it in v14.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CA%2B4qtLct1L%3DgUordX4c_AdctJ%2BvZBsebYYLBk18LX8dLHthktg%40mail.gmail.com

diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 6e3ca788f6..aa1499bcea 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -63,11 +63,22 @@ EXEC SQL ...;
 
These statements syntactically take the place of a C statement.
Depending on the particular statement, they can appear at the
-   global level or within a function.  Embedded
+   global level or within a function.
+  
+
+  
+   Embedded
SQL statements follow the case-sensitivity rules of
normal SQL code, and not those of C. Also they allow nested
C-style comments that are part of the SQL standard. The C part of the
program, however, follows the C standard of not accepting nested comments.
+   Embedded SQL statements likewise use SQL rules, not
+   C rules, for parsing quoted strings and identifiers.
+   (See  and
+respectively.  Note that
+   ECPG assumes that standard_conforming_strings
+   is on.)
+   Of course, the C part of the program follows C quoting rules.
   
 
   
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 466bbac6a7..e98aa6c486 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -623,11 +623,8 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 	}
 }
 
-{xqdouble}	{ addlitchar('\''); }
-{xqcquote}	{
-	addlitchar('\\');
-	addlitchar('\'');
-}
+{xqdouble}	{ addlit(yytext, yyleng); }
+{xqcquote}{ addlit(yytext, yyleng); }
 {xqinside}	{ addlit(yytext, yyleng); }
 {xeinside}  {
 	addlit(yytext, yyleng);
@@ -736,7 +733,7 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 	return UIDENT;
 }
 {xddouble}	{
-	addlitchar('"');
+	addlit(yytext, yyleng);
 }
 {xdinside}	{
 	addlit(yytext, yyleng);
diff --git a/src/interfaces/ecpg/test/expected/preproc-strings.c b/src/interfaces/ecpg/test/expected/preproc-strings.c
index e695007b13..1e50cd36c3 100644
--- a/src/interfaces/ecpg/test/expected/preproc-strings.c
+++ b/src/interfaces/ecpg/test/expected/preproc-strings.c
@@ -45,7 +45,7 @@ int main(void)
 #line 13 "strings.pgc"
 
 
-  { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "select 'abcdef' , N'abcdef' as foo , E'abc\\bdef' as \"foo\" , U&'d\\0061t\\0061' as U&\"foo\" , U&'d!+61t!+61' UESCAPE '!' , $foo$abc$def$foo$", ECPGt_EOIT, 
+  { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "select 'abc''d\\ef' , N'abc''d\\ef' as foo , E'abc''def' as \"foo\"\"bar\" , U&'d\\0061t\\0061' as U&\"foo\"\"bar\" , U&'d!+61t!+61' UESCAPE '!' , $foo$abc$def$foo$", ECPGt_EOIT, 
 	ECPGt_char,&(s1),(long)0,(long)1,(1)*sizeof(char), 
 	ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, 
 	ECPGt_char,&(s2),(long)0,(long)1,(1)*sizeof(char), 
diff --git a/src/interfaces/ecpg/test/expected/preproc-strings.stderr b/src/interfaces/ecpg/test/expected/preproc-strings.stderr
index dbc9e5c0b8..4c3a8eee5a 100644
--- a/src/interfaces/ecpg/test/expected/preproc-strings.stderr
+++ b/src/interfaces/ecpg/test/expected/preproc-strings.stderr
@@ -8,7 +8,7 @@
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_process_output on line 13: OK: SET
 [NO_PID]: sqlca: code: 0, state: 0
-[NO_PID]: ecpg_execute on line 15: query: select 'abcdef' , N'abcdef' as foo , E'abc\bdef' as "foo" , U&'d\0061t\0061' as U&"foo" , U&'d!+61t!+61' UESCAPE '!' , $foo$abc$def$foo$; with 0 parameter(s) on connection ecpg1_regression
+[NO_PID]: ecpg_execute on line 15: query: select 'abc''d\ef' , N'abc''d\ef' as foo , E'abc''d\\ef' as "foo""bar" , U&'d\0061t\0061' as U&"foo""bar" , U&'d!+61t!+61' UESCAPE '!' , $foo$abc$def$foo$; with 0 parameter(s) on connection ecpg1_regression
 [NO_PID]: sqlca: code: 0, state: 0
 [NO_PID]: ecpg_execute on line 15: using PQexec
 [NO_PID]: sqlca: code: 0, state: 0
@@ -16,15 +16,15 @@
 [NO_PID]: sqlca: code: 0, state: 0
 

Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Justin Pryzby
On Tue, Oct 20, 2020 at 04:04:20PM -0300, Alvaro Herrera wrote:
> On 2020-Sep-30, Justin Pryzby wrote:
> 
> > CREATE TABLE t(i int) PARTITION BY RANGE(i);
> > CREATE TABLE t1 PARTITION OF t FOR VALUES FROM (1) TO (10);
> > CREATE OR REPLACE FUNCTION tgf() RETURNS trigger LANGUAGE plpgsql AS $$ 
> > begin raise exception 'except'; end $$;
> > CREATE TRIGGER tg AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION tgf();
> > ALTER TABLE t1 DISABLE TRIGGER tg;
> > INSERT INTO t VALUES(1); -- inserts when trigger is disabled: good
> > ALTER TABLE t DISABLE TRIGGER tg;
> > CREATE TABLE t2 PARTITION OF t FOR VALUES FROM (10) TO (20);
> > 
> > postgres=# SELECT tgrelid::regclass, tgenabled FROM pg_trigger WHERE 
> > tgrelid::regclass::text IN ('t1','t2');
> >  tgrelid | tgenabled 
> > -+---
> >  t1  | D
> >  t2  | O
> > (2 rows)
> > 
> > I consider this a bug,but CreateTrigStmt doesn't have any "enabled" member
> > (since it's impossible to CREATE TRIGGER .. DISABLED), so I'm not sure where
> > the fix should be.
> 
> Hmm, next question: should we backpatch a fix for this?  (This applies
> all the way back to 11.)  If we do, then we would change behavior of
> partition creation.  It's hard to see that the current behavior is
> desirable ... and I think anybody who would have come across this, would
> wish it behaved the other way.  But still -- it would definitely be a
> behavior change.

+0.8 to backpatch.  To v13 if not further.

We don't normally disable triggers, otherwise I would say +1.

For context, I ran into this issue while migrating a customer to a new server
using pg_restore and a custom backup script which loops around pg_dump, and
handles partitioned tables differently depending if they're recent or historic.

Our backup job works well, but is technically a bit of a hack.  It doesn't do
the right thing (causes sql errors and pg_restore warnings) for inherited
indexes and, apparently, triggers.  Disabling the trigger was my 4th attempt to
handle an error restoring a specific table (mismatched column type between
parent dump and child dumped several days earlier).  I eventually (5th
or 6th attempt) dropped the parent trigger, created the child tables using
--section=pre-data, ALTERed a column to match, and then ran post-data and
attached it.

-- 
Justin




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Andres Freund
Hi,

On 2020-10-20 14:39:42 -0400, Robert Haas wrote:
> On Tue, Oct 20, 2020 at 2:34 PM Alvaro Herrera  
> wrote:
> > On 2020-Oct-20, Tom Lane wrote:
> > > and the fact that we've gone twenty-odd years without similar
> > > previous proposals doesn't speak well for it being really useful.
> >
> > Actually, there's at least two threads about this:
> >
> > https://postgr.es/m/flat/86d2ceg611@jerry.enova.com
> > https://postgr.es/m/flat/74af1f60-34af-633e-ee8a-310d40c741a7%402ndquadrant.com
> 
> Yeah, I think this is clearly useful.

The proposal in this thread doesn't really allow what those mails
request. It explicitly just adds SET += to SQL, *not* to the config
file.


> ALTER SYSTEM
> shared_preload_libraries += direshark seems like a case with a lot of
> obvious practical application, and adding things to search_path seems
> compelling, too.

As far as I understand what the proposal does, if you were to do do an
ALTER SYSTEM like you do, it'd actually write an "absolute"
shared_preload_libraries value to postgresql.auto.conf. So if you then
subsequently modified the shared_preload_libraries in postgresql.conf
it'd be overwritten by the postgresql.auto.conf value, not "newly"
appended to.

Greetings,

Andres Freund




Re: Support for NSS as a libpq TLS backend

2020-10-20 Thread Andres Freund
Hi,

On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
> From: Daniel Gustafsson 
> Date: Thu, 8 Oct 2020 18:44:28 +0200
> Subject: [PATCH] Support for NSS as a TLS backend v12
> ---
>  configure |  223 +++-
>  configure.ac  |   39 +-
>  contrib/Makefile  |2 +-
>  contrib/pgcrypto/Makefile |5 +
>  contrib/pgcrypto/nss.c|  773 +++
>  contrib/pgcrypto/openssl.c|2 +-
>  contrib/pgcrypto/px.c |1 +
>  contrib/pgcrypto/px.h |1 +

Personally I'd like to see this patch broken up a bit - it's quite
large. Several of the changes could easily be committed separately, no?


>  if test "$with_openssl" = yes ; then
> +  if test x"$with_nss" = x"yes" ; then
> +AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Based on a quick look there's no similar error check for the msvc
build. Should there be?

>  
> +if test "$with_nss" = yes ; then
> +  if test x"$with_openssl" = x"yes" ; then
> +AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
> +  fi

Isn't this a repetition of the earlier check?


> +  CLEANLDFLAGS="$LDFLAGS"
> +  # TODO: document this set of LDFLAGS
> +  LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"

Shouldn't this use nss-config or such?


> +if test "$with_nss" = yes ; then
> +  AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file  is required 
> for NSS])])
> +  AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file  is required 
> for NSS])])
> +fi

Hm. For me, on debian, these headers are not directly in the default
include search path, but would be as nss/ssl.h. I don't see you adding
nss/ to CFLAGS anywhere? How does this work currently?

I think it'd also be better if we could include these files as nss/ssl.h
etc - ssl.h is a name way too likely to conflict imo.

> +++ b/src/backend/libpq/be-secure-nss.c
> @@ -0,0 +1,1158 @@
> +/*
> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to 
> undef
> + * our version to avoid compiler warnings on redefinition.
> + */
> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
> +#undef BITS_PER_BYTE

Most compilers/preprocessors don't warn about redefinitions when they
would result in the same value (IIRC we have some cases of redefinitions
in tree even). Does nspr's differ?


> +/*
> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
> + * colliding definitions from ours, causing a much expected compiler error.
> + * The definitions are however not actually used in NSPR at all, and are only
> + * intended for what seems to be backwards compatibility for apps written
> + * against old versions of NSPR.  The following comment is in the referenced
> + * file, and was added in 1998:
> + *
> + *   This section typedefs the old 'native' types to the new 
> PRs.
> + *   These definitions are scheduled to be eliminated at the earliest
> + *   possible time. The NSPR API is implemented and documented using
> + *   the new definitions.
> + *
> + * As there is no opt-out from pulling in these typedefs, we define the guard
> + * for the file to exclude it. This is incredibly ugly, but seems to be about
> + * the only way around it.
> + */
> +#define PROTYPES_H
> +#include 
> +#undef PROTYPES_H

Yuck :(.


> +int
> +be_tls_init(bool isServerStart)
> +{
> + SECStatus   status;
> + SSLVersionRange supported_sslver;
> +
> + /*
> +  * Set up the connection cache for multi-processing application 
> behavior.

Hm. Do we necessarily want that? Session resumption is not exactly
unproblematic... Or does this do something else?


> +  * If we are in ServerStart then we initialize the cache. If the server 
> is
> +  * already started, we inherit the cache such that it can be used for
> +  * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
> +  * variable which contains enough information for the forked child to 
> know
> +  * how to access it.  Passing NULL to SSL_InheritMPServerSIDCache will
> +  * make the forked child look it up by the default name SSL_INHERITANCE,
> +  * if env vars aren't inherited then the contents of the variable can be
> +  * passed instead.
> +  */

Does this stuff work on windows / EXEC_BACKEND?


> +  * The below parameters are what the implicit initialization would've 
> done
> +  * for us, and should work even for older versions where it might not be
> +  * done automatically. The last parameter, maxPTDs, is set to various
> +  * values in other codebases, but has been unused since NSPR 2.1 which 
> was
> +  * released sometime in 1998.
> +  */
> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs 

Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2020-10-20 Thread Alvaro Herrera
On 2020-Sep-30, Justin Pryzby wrote:

> CREATE TABLE t(i int) PARTITION BY RANGE(i);
> CREATE TABLE t1 PARTITION OF t FOR VALUES FROM (1) TO (10);
> CREATE OR REPLACE FUNCTION tgf() RETURNS trigger LANGUAGE plpgsql AS $$ begin 
> raise exception 'except'; end $$;
> CREATE TRIGGER tg AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION tgf();
> ALTER TABLE t1 DISABLE TRIGGER tg;
> INSERT INTO t VALUES(1); -- inserts when trigger is disabled: good
> ALTER TABLE t DISABLE TRIGGER tg;
> CREATE TABLE t2 PARTITION OF t FOR VALUES FROM (10) TO (20);
> 
> postgres=# SELECT tgrelid::regclass, tgenabled FROM pg_trigger WHERE 
> tgrelid::regclass::text IN ('t1','t2');
>  tgrelid | tgenabled 
> -+---
>  t1  | D
>  t2  | O
> (2 rows)
> 
> I consider this a bug,but CreateTrigStmt doesn't have any "enabled" member
> (since it's impossible to CREATE TRIGGER .. DISABLED), so I'm not sure where
> the fix should be.

Hmm, next question: should we backpatch a fix for this?  (This applies
all the way back to 11.)  If we do, then we would change behavior of
partition creation.  It's hard to see that the current behavior is
desirable ... and I think anybody who would have come across this, would
wish it behaved the other way.  But still -- it would definitely be a
behavior change.

This is a judgement call, and mine says to backpatch, but I've been
wrong on that.




Re: Hash support for row types

2020-10-20 Thread Robert Haas
On Tue, Oct 20, 2020 at 11:10 AM Peter Eisentraut
 wrote:
> On 2020-10-20 01:32, Andres Freund wrote:
> > How does this deal with row types with a field that doesn't have a hash
> > function? Erroring out at runtime could cause queries that used to
> > succeed, e.g. because all fields have btree ops, to fail, if we just have
> > a generic unconditionally present hash opclass?  Is that an OK
> > "regression"?
>
> Good point.  There is actually code in the type cache that is supposed
> to handle that, so I'll need to adjust that.

Do we need to worry about what happens if somebody modifies the
opclass/opfamily definitions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Robert Haas
On Tue, Oct 20, 2020 at 2:34 PM Alvaro Herrera  wrote:
> On 2020-Oct-20, Tom Lane wrote:
> > and the fact that we've gone twenty-odd years without similar
> > previous proposals doesn't speak well for it being really useful.
>
> Actually, there's at least two threads about this:
>
> https://postgr.es/m/flat/86d2ceg611@jerry.enova.com
> https://postgr.es/m/flat/74af1f60-34af-633e-ee8a-310d40c741a7%402ndquadrant.com

Yeah, I think this is clearly useful. ALTER SYSTEM
shared_preload_libraries += direshark seems like a case with a lot of
obvious practical application, and adding things to search_path seems
compelling, too.

FWIW, I also like the chosen syntax. I think it's more useful with
lists than with numbers, but maybe it's at least somewhat useful for
both. However, I do wonder if it'd be good to have a list-manipulation
syntax that allows you to control where the item gets inserted --
start and end being the cases that people would want most, I suppose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Alvaro Herrera
On 2020-Oct-20, Tom Lane wrote:

> and the fact that we've gone twenty-odd years without similar
> previous proposals doesn't speak well for it being really useful.

Actually, there's at least two threads about this:

https://postgr.es/m/flat/86d2ceg611@jerry.enova.com
https://postgr.es/m/flat/74af1f60-34af-633e-ee8a-310d40c741a7%402ndquadrant.com





Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Tom Lane
Andres Freund  writes:
> On 2020-10-20 14:16:12 -0400, Tom Lane wrote:
>> I'd make that point against the whole proposal.  There's nothing here that
>> can't be done with current_setting() + set_config().

> The one case where I can see SET support being useful even without
> config support is to allow for things like
> ALTER DATABASE somedatabase SET search_path += 'myapp';

Hmm, yeah, that's fractionally less easy to build from spare parts
than the plain SET case.

But I think there are more definitional hazards than you are letting
on.  If there's no existing pg_db_role_setting entry, what value
exactly are we += 'ing onto, and why?

regards, tom lane




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Andres Freund
Hi,

On 2020-10-20 14:16:12 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Given that this is just SQL level, I don't see why we'd need a special
> > type of language here. You can just use DO etc.
> 
> I'd make that point against the whole proposal.  There's nothing here that
> can't be done with current_setting() + set_config().  I'm pretty dubious
> about layering extra functionality into such a fundamental utility command
> as SET; and the fact that we've gone twenty-odd years without similar
> previous proposals doesn't speak well for it being really useful.

>From my POV it'd make sense to have SET support mirroring config file
syntax if we had it. And there've certainly been requests for
that...

The one case where I can see SET support being useful even without
config support is to allow for things like
ALTER DATABASE somedatabase SET search_path += 'myapp';

Greetings,

Andres Freund




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Tom Lane
Andres Freund  writes:
> Given that this is just SQL level, I don't see why we'd need a special
> type of language here. You can just use DO etc.

I'd make that point against the whole proposal.  There's nothing here that
can't be done with current_setting() + set_config().  I'm pretty dubious
about layering extra functionality into such a fundamental utility command
as SET; and the fact that we've gone twenty-odd years without similar
previous proposals doesn't speak well for it being really useful.

regards, tom lane




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Isaac Morland
On Sun, 27 Sep 2020 at 23:39, Abhijit Menon-Sen  wrote:


> postgres=# SET search_path += octopus;
> SET
> postgres=# SET search_path += "giant squid", kraken, narwhal; -- [1]
> SET
> postgres=# SET search_path -= public, narwhal;
> SET
> postgres=# SHOW search_path;
>

What happens if the new value for += is already in the list?
What about -= on a value that occurs in the list multiple times?


Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Andres Freund
Hi,

On 2020-09-28 09:09:24 +0530, Abhijit Menon-Sen wrote:
> postgres=# SET search_path += octopus;
> SET

Yea, that would be quite useful!



> The second patch
> (0002-Support-SET-syntax-for-numeric-configuration-setting.patch) adds
> support to modify numeric configuration settings:
> 
> postgres=# SET cpu_tuple_cost += 0.02;
> SET
> postgres=# SET effective_cache_size += '2GB';
> SET
> postgres=# SHOW effective_cache_size;
> ┌──┐
> │ effective_cache_size │
> ├──┤
> │ 6GB  │
> └──┘
> (1 row)
> postgres=# ALTER SYSTEM SET max_worker_processes += 4;
> ALTER SYSTEM

Much less clear that this is a good idea...

It seems to me that appending and incrementing using the same syntax is
a) confusing b) will be a limitation before long.


> These patches do not affect configuration file parsing in any way: its
> use is limited to "SET" and "ALTER xxx SET".

Are you including user / database settings as part of ALTER ... SET? Or
just SYSTEM?


I'm not convinced that having different features for the SQL level is a
good idea.



> (Another feature that could be implemented using this framework is to
> ensure the current setting is at least as large as a given value:
> 
> ALTER SYSTEM SET shared_buffers >= '8GB';

Given that this is just SQL level, I don't see why we'd need a special
type of language here. You can just use DO etc.

Greetings,

Andres Freund




Re: Non-configure build of thread_test has been broken for awhile

2020-10-20 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Oct-18, Tom Lane wrote:
>> It doesn't really seem sane to me to support two different build
>> environments for thread_test, especially when one of them is so
>> little-used that it can be broken for years before we notice.
>> So I'd be inclined to rip out the Makefile and just consider
>> that thread_test.c is *only* meant to be used by configure.
>> If we wish to resurrect the standalone build method, we could
>> probably do so by adding LIBS to the Makefile's link command
>> ... but what's the point, and what will keep it from getting
>> broken again later?

> Standalone usage of that program is evidently non-existant, so +1 for
> removing the Makefile and just keep the configure compile path for it.

I concluded that if thread_test.c will only be used by configure,
then we should stick it under $(SRCDIR)/config/ and nuke the
src/test/thread/ subdirectory altogether.  See attached.

> BTW the only animal reporting without thread-safety in the buildfarm is
> gaur.

Yeah.  At some point maybe we should just drop support for non-thread-safe
platforms, but I'm not proposing to do that yet.

regards, tom lane

diff --git a/src/test/thread/thread_test.c b/config/thread_test.c
similarity index 93%
rename from src/test/thread/thread_test.c
rename to config/thread_test.c
index 09603c95dd..ff2eace87d 100644
--- a/src/test/thread/thread_test.c
+++ b/config/thread_test.c
@@ -1,12 +1,12 @@
 /*-
  *
  * thread_test.c
- *		libc thread test program
+ *		libc threading test program
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- *	src/test/thread/thread_test.c
+ *	config/thread_test.c
  *
  *	This program tests to see if your standard libc functions use
  *	pthread_setspecific()/pthread_getspecific() to be thread-safe.
@@ -20,12 +20,7 @@
  *-
  */
 
-#if !defined(IN_CONFIGURE) && !defined(WIN32)
-#include "postgres.h"
-
-/* we want to know what the native strerror does, not pg_strerror */
-#undef strerror
-#endif
+/* We cannot use c.h, as port.h will not exist yet */
 
 #include 
 #include 
@@ -36,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* CYGWIN requires this for MAXHOSTNAMELEN */
 #ifdef __CYGWIN__
@@ -47,25 +43,11 @@
 #include 
 #endif
 
-
 /* Test for POSIX.1c 2-arg sigwait() and fail on single-arg version */
 #include 
 int			sigwait(const sigset_t *set, int *sig);
 
 
-#if !defined(ENABLE_THREAD_SAFETY) && !defined(IN_CONFIGURE) && !defined(WIN32)
-int
-main(int argc, char *argv[])
-{
-	fprintf(stderr, "This PostgreSQL build does not support threads.\n");
-	fprintf(stderr, "Perhaps rerun 'configure' using '--enable-thread-safety'.\n");
-	return 1;
-}
-#else
-
-/* This must be down here because this is the code that uses threads. */
-#include 
-
 #define		TEMP_FILENAME_1 "thread_test.1"
 #define		TEMP_FILENAME_2 "thread_test.2"
 
@@ -119,11 +101,9 @@ main(int argc, char *argv[])
 		return 1;
 	}
 
-#ifdef IN_CONFIGURE
 	/* Send stdout to 'config.log' */
 	close(1);
 	dup(5);
-#endif
 
 #ifdef WIN32
 	err = WSAStartup(MAKEWORD(2, 2), );
@@ -455,5 +435,3 @@ func_call_2(void)
 	pthread_mutex_lock(_mutex);	/* wait for parent to test */
 	pthread_mutex_unlock(_mutex);
 }
-
-#endif			/* !ENABLE_THREAD_SAFETY && !IN_CONFIGURE */
diff --git a/configure b/configure
index 071d050ef0..ace4ed5dec 100755
--- a/configure
+++ b/configure
@@ -18992,23 +18992,21 @@ $as_echo_n "checking thread safety of required library functions... " >&6; }
 
 _CFLAGS="$CFLAGS"
 _LIBS="$LIBS"
-CFLAGS="$CFLAGS $PTHREAD_CFLAGS -DIN_CONFIGURE"
+CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
 LIBS="$LIBS $PTHREAD_LIBS"
 if test "$cross_compiling" = yes; then :
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: maybe" >&5
 $as_echo "maybe" >&6; }
   { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING:
 *** Skipping thread test program because of cross-compile build.
-*** Run the program in src/test/thread on the target machine.
 " >&5
 $as_echo "$as_me: WARNING:
 *** Skipping thread test program because of cross-compile build.
-*** Run the program in src/test/thread on the target machine.
 " >&2;}
 else
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-#include "$srcdir/src/test/thread/thread_test.c"
+#include "$srcdir/config/thread_test.c"
 _ACEOF
 if ac_fn_c_try_run "$LINENO"; then :
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
@@ -19017,9 +19015,8 @@ else
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }
   as_fn_error $? "thread test program failed
-This platform is not thread-safe.  Check the file 'config.log' or compile
-and run src/test/thread/thread_test for the exact reason.
-Use --disable-thread-safety to disable thread safety." "$LINENO" 5
+This platform is not 

Re: PostgresNode::backup uses spread checkpoint?

2020-10-20 Thread David Steele

On 10/20/20 11:01 AM, Alvaro Herrera wrote:

I noticed a few days ago that method backup() in PostgresNode uses
pg_basebackup without specifying a checkpoint mode -- and the default is
a spread checkpoint, which may cause any tests that use that to take
slightly longer than the bare minimum.

I propose to make it use a fast checkpoint, as per the attached.


+1.

--
-David
da...@pgmasters.net




Re: Hash support for row types

2020-10-20 Thread Peter Eisentraut

On 2020-10-20 01:32, Andres Freund wrote:

How does this deal with row types with a field that doesn't have a hash
function? Erroring out at runtime could cause queries that used to
succeed, e.g. because all fields have btree ops, to fail, if we just have
a generic unconditionally present hash opclass?  Is that an OK
"regression"?


Good point.  There is actually code in the type cache that is supposed 
to handle that, so I'll need to adjust that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




PostgresNode::backup uses spread checkpoint?

2020-10-20 Thread Alvaro Herrera
I noticed a few days ago that method backup() in PostgresNode uses
pg_basebackup without specifying a checkpoint mode -- and the default is
a spread checkpoint, which may cause any tests that use that to take
slightly longer than the bare minimum.

I propose to make it use a fast checkpoint, as per the attached.

-- 
Álvaro Herrera
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 81ad42916b..72d8393fca 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -551,7 +551,7 @@ sub backup
 
 	print "# Taking pg_basebackup $backup_name from node \"$name\"\n";
 	TestLib::system_or_bail('pg_basebackup', '-D', $backup_path, '-h',
-		$self->host, '-p', $self->port, '--no-sync');
+		$self->host, '-p', $self->port, '-cfast', '--no-sync');
 	print "# Backup finished\n";
 	return;
 }


Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Shay Rojansky
Very interesting conversation, thanks for including me Dave. Here are some
thoughts from the Npgsql perspective,

Re the binary vs. text discussion... A long time ago, Npgsql became a
"binary-only" driver, meaning that it never sends or receives values in
text encoding, and practically always uses the extended protocol. This was
because in most (all?) cases, encoding/decoding binary is more efficient,
and maintaining two encoders/decoders (one for text, one for binary) made
less and less sense. So by default, Npgsql just requests "all binary" in
all Bind messages it sends (there's an API for the user to request text, in
which case they get pure strings which they're responsible for parsing).
Binary handling is implemented for almost all PG types which support it,
and I've hardly seen any complaints about this for the last few years. I'd
be interested in any arguments against this decision (Dave, when have you
seen that decoding binary is slower than decoding text?).

Given the above, allowing the client to specify in advance which types
should be in binary sounds good, but wouldn't help Npgsql much (since by
default it already requests binary for everything). It would slightly help
in allowing binary-unsupported types to automatically come back as text
without manual user API calls, but as I wrote above this is an extremely
rare scenario that people don't care much about.

> Is there really a good reason for forcing the client to issue NextResult,
Describe, Execute for each of the dynamic result sets?

I very much agree - it should be possible to execute a procedure and
consume all results in a single roundtrip, otherwise this is quite a perf
killer.

Peter, from your original message:

> Following the model from the simple query protocol, CommandComplete
really means one result set complete, not the whole top-level command.
ReadyForQuery means the whole command is complete. This is perhaps
debatable, and interesting questions could also arise when considering what
should happen in the simple query protocol when a query string consists of
multiple commands each returning multiple result sets. But it doesn't
really seem sensible to cater to that

Npgsql implements batching of multiple statements via the extended protocol
in a similar way. In other words, the .NET API allows users to pack
multiple SQL statements and execute them in one roundtrip, and Npgsql does
this by sending
Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So
CommandComplete signals completion of a single statement in the batch,
whereas ReadyForQuery signals completion of the entire batch. This means
that the "interesting questions" mentioned above are possibly relevant to
the extended protocol as well.


Re: partition routing layering in nodeModifyTable.c

2020-10-20 Thread Amit Langote
On Mon, Oct 19, 2020 at 8:55 PM Heikki Linnakangas  wrote:
> On 17/10/2020 18:54, Alvaro Herrera wrote:
> > On 2020-Oct-17, Amit Langote wrote:
> >> As I said in my previous email, I don't see how we can make
> >> initializing the map any lazier than it already is.  If a partition
> >> has a different tuple descriptor than the root table, then we know for
> >> sure that any tuples that are routed to it will need to be converted
> >> from the root tuple format to its tuple format, so we might as well
> >> build the map when the ResultRelInfo is built.  If no tuple lands into
> >> a partition, we would neither build its ResultRelInfo nor the map.
> >> With the current arrangement, if the map field is NULL, it simply
> >> means that the partition has the same tuple format as the root table.
> >
> > I see -- makes sense.
>
> It's probably true that there's no performance gain from initializing
> them more lazily. But the reasoning and logic around the initialization
> is complicated. After tracing through various path through the code, I'm
> convinced enough that it's correct, or at least these patches didn't
> break it, but I still think some sort of lazy initialization on first
> use would make it more readable. Or perhaps there's some other
> refactoring we could do.

So the other patch I have mentioned is about lazy initialization of
the ResultRelInfo itself, not the individual fields, but maybe with
enough refactoring we can get the latter too.

Currently, ExecInitModifyTable() performs ExecInitResultRelation() for
all relations in ModifyTable.resultRelations, which sets most but not
all ResultRelInfo fields (whatever InitResultRelInfo() can set),
followed by initializing some other fields based on the contents of
the ModifyTable plan.  My patch moves those two steps into a function
ExecBuildResultRelation() which is called lazily during
ExecModifyTable() for a given result relation on the first tuple
produced by that relation's plan.  Actually, there's a "getter" named
ExecGetResultRelation() which first consults es_result_relations[rti -
1] for the requested relation and if it's NULL then calls
ExecBuildResultRelation().

Would you mind taking a look at that as a starting point?  I am
thinking there's enough relevant discussion here that I should post
the rebased version of that patch here.

> Perhaps we should have a magic TupleConversionMap value to mean "no
> conversion required". NULL could then mean "not initialized yet".

Perhaps, a TupleConversionMap with its attrMap set to NULL means "no
conversion required".


--
Amit Langote
EDB: http://www.enterprisedb.com




Re: Make procedure OUT parameters work with JDBC

2020-10-20 Thread Andrew Dunstan


On 10/19/20 8:35 PM, Craig Ringer wrote:
>
>
> On Mon, 19 Oct 2020, 19:16 Andrew Dunstan,  > wrote:
>
>
> On 10/19/20 5:19 AM, Peter Eisentraut wrote:
> > A follow-up to the recently added support for OUT parameters for
> > procedures.  The JDBC driver sends OUT parameters with type void. 
> > This makes sense when calling a function, so that the parameters are
> > ignored in ParseFuncOrColumn().  For a procedure call we want to
> treat
> > them as unknown.  This is of course a bit of a hack on top of
> another
> > hack, but it's small and contained and gets the job done.
> >
>
>
> The JDBC spec defines CallableStatement.registerOutPararameter(...)
> variants that take SQLType enumeration value and optionally type name.
>
> It's important that this change not break correct and fully specified
> use of the CallableStatement interface.


The JDBC driver currently implements this but discards any type
information and sends VOIDOID. This patch accommodates that. This
actually works fine, except in the case of overloaded procedures, where
the workaround is to include an explicit cast in the CALL statement.

Modifying the JDBC driver to send real type info for these cases is
something to be done, but there are some difficulties in that the class
where it's handled doesn't have enough context. And there will also
always be cases where it really doesn't know what to send (user defined
types etc.), so sending VOIDOID or UNKNOWNOID will still be done.


cheers


andrew


-- 
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Wired if-statement in gen_partprune_steps_internal

2020-10-20 Thread Amit Langote
Hi Andy,

On Tue, Oct 20, 2020 at 4:05 PM Andy Fan  wrote:
> On Wed, Oct 14, 2020 at 11:26 AM Andy Fan  wrote:
>> On Mon, Oct 12, 2020 at 4:37 PM Amit Langote  wrote:
>>> I think we should remove this duplicative logic and return the
>>> generated steps in a list from this function, which the code in
>>> gen_partprune_steps_internal() then "combines" using an INTERSECT
>>> step.  See attached a patch to show what I mean.
>>>
>>
>> This changes LGTM, and "make check" PASSED,  thanks for the patch!
>>
>
> I created https://commitfest.postgresql.org/30/2771/ so that this patch will 
> not
> be lost.  Thanks!

Thanks for doing that.

I had updated the patch last week to address Horiguchi-san's comments
but didn't manage to post a polished-enough version.  I will try again
this week.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2020-10-20 Thread Dilip Kumar
On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs  wrote:
>
> On Tue, 20 Oct 2020 at 09:50, Dilip Kumar  wrote:
>
> > > > Why would we want this? What problem are you trying to solve?
> > >
> > > The requirement is to know the last replayed WAL on the standby so
> > > unless we can guarantee that the recovery is actually paused we can
> > > never get the safe last_replay_lsn value.
> > >
> > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as 
> > > > you wish?
> > >
> > > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > > API and the behavior is to know whether the recovery paused is
> > > requested or not,  So I am not sure is it a good idea to change the
> > > behavior of the existing API?
> > >
> >
> > Attached is the POC patch to show what I have in mind.
>
> If you don't like it, I doubt anyone else cares for the exact current
> behavior either. Thanks for pointing those issues out.
>
> It would make sense to alter pg_wal_replay_pause() so that it blocks
> until paused.
>
> I suggest you add the 3-value state as you suggest, but make
> pg_is_wal_replay_paused() respond:
> if paused, true
> if requested, wait until paused, then return true
> else false
>
> That then solves your issues with a smoother interface.
>

Make sense to me, I will change as per the suggestion.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-20 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 20:11, Amit Kapila  wrote:
>
> On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada
>  wrote:
> >
> > On Mon, 19 Oct 2020 at 14:24, Amit Kapila  wrote:
> > >
> >
> > > So, let's
> > > stick with 'text' data type for slot_name which means we should
> > > go-ahead with the v3 version of the patch [1] posted by Shinoda-San,
> > > right?
> >
> > Right. +1
> >
>
> I have pushed the patch after updating the test_decoding/sql/stats.
> The corresponding changes in the test were missing.

Thank you!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 17:56, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > It seems to respond to a statement-cancel signal immediately while
> > waiting for a coming byte.  However, seems to wait forever while
> > waiting a space in send-buffer. (Is that mean the session will be
> > stuck if it sends a large chunk of bytes while the network is down?)
>
> What part makes you worried about that?  libpq's send processing?
>
> I've just examined pgfdw_cancel_query(), too.  As below, it uses a hidden 30 
> second timeout.  After all, postgres_fdw also relies on timeout already.

It uses the timeout but it's also cancellable before the timeout. See
we call CHECK_FOR_INTERRUPTS() in pgfdw_get_cleanup_result().

>
>
> > After receiving a signal, it closes the problem connection. So the
> > local session is usable after that but the fiailed remote sessions are
> > closed and created another one at the next use.
>
> I couldn't see that the problematic connection is closed when the 
> cancellation fails... Am I looking at a wrong place?
>
> /*
>  * If connection is already unsalvageable, don't touch it
>  * further.
>  */
> if (entry->changing_xact_state)
> break;
>

I guess Horiguchi-san refereed the following code in pgfdw_xact_callback():

/*
 * If the connection isn't in a good idle state, discard it to
 * recover. Next GetConnection will open a new connection.
 */
if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state)
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - test whether a variable exists

2020-10-20 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   tested, failed
Documentation:not tested

I am not very convinced to have that, but I have performed some testing on 
master ().

I found a crash 

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `pgbench postgres -T 60 -f a.sql'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x55bfb572a4b2 in expr_yylex (yylval_param=0x7fff29a60870, 
yyscanner=0x55bfb6867a90) at exprscan.c:1107
1107*yy_cp = yyg->yy_hold_char;
(gdb) bt
#0  0x55bfb572a4b2 in expr_yylex (yylval_param=0x7fff29a60870, 
yyscanner=0x55bfb6867a90) at exprscan.c:1107
#1  0x55bfb572cb63 in expr_lex_one_word (state=0x55bfb6867a10, 
word_buf=0x7fff29a608d0, offset=0x7fff29a608a8) at exprscan.l:340
#2  0x55bfb57358fd in process_backslash_command (sstate=0x55bfb6867a10, 
source=0x55bfb68653a0 "a.sql") at pgbench.c:4540
#3  0x55bfb5736528 in ParseScript (
script=0x55bfb68655f0 "\\set nbranches :scale\n\\set ntellers 10 * 
:scale\n\\set naccounts 10 * :scale\n\\setrandom aid 1 
:naccounts\n\\setrandom bid 1 :nbranches\n\\setrandom tid 1 
:ntellers\n\\setrandom delta -5000 5000\nBEGIN;\nUPD"..., desc=0x55bfb68653a0 
"a.sql", weight=1) at pgbench.c:4826
#4  0x55bfb5736967 in process_file (filename=0x55bfb68653a0 "a.sql", 
weight=1) at pgbench.c:4962
#5  0x55bfb5738628 in main (argc=6, argv=0x7fff29a610d8) at pgbench.c:5641

The new status of this patch is: Waiting on Author


Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Erik Rijkers

On 2020-10-20 13:02, Ibrar Ahmed wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Thanks for the patch. The patch works on my machine as per specs on
the master branch (26ec6b5948a73d0e07ed9435ee4554594acdf34f).


FWIW, I checked the same steps,
including doc-builds of PDF-A4, and .html (on debian 9/stretch, with gcc 
10.1)


I found no problems.

Nice feature!


Erik Rijkers




Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 16:54, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
> >  wrote in
> > > I think it doesn't matter whether in FDW framework or not. The user
> > > normally doesn't care which backend processes connecting to foreign
> > > servers. They will attempt to cancel the query like always if they
> > > realized that a backend gets stuck. There are surely plenty of users
> > > who use query cancellation.
> >
> > The most serious impact from inability of canceling a query on a
> > certain session is that server-restart is required to end such a
> > session.
>
> OK, as I may be repeating, I didn't deny the need for cancellation.

So what's your opinion?

> Let''s organize the argument.
>
> * FDW in general
> My understanding is that the FDW feature does not stipulate anything about 
> cancellation.  In fact, odbc_fdw was uncancelable.  What do we do about this?
>
> * postgres_fdw
> Fortunately, it is (should be?) cancelable whatever method we choose for 2PC. 
>  So no problem.
> But is it really cancellable now?  What if the libpq call is waiting for 
> response when the foreign server or network is down?

I don’t think we need to stipulate the query cancellation. Anyway I
guess the facts neither that we don’t stipulate anything about query
cancellation now nor that postgres_fdw might not be cancellable in
some situations now are not a reason for not supporting query
cancellation. If it's a desirable behavior and users want it, we need
to put an effort to support it as much as possible like we’ve done in
postgres_fdw.  Some FDWs unfortunately might not be able to support it
only by their functionality but it would be good if we can achieve
that by combination of PostgreSQL and FDW plugins.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [doc] improve tableoid description

2020-10-20 Thread Ashutosh Bapat
On Mon, Oct 19, 2020 at 5:58 PM Ian Lawrence Barwick  wrote:
>
> > That line further suggests using
> > regnamespace which is not as efficient as joining with
> > pg_namespace.oid. But pg_namespace won't have as many entries as
> > pg_class so casting to regnamespace might be fine. Should we suggest
> > both the methods somehow?
>
> On further reflection, I think trying to explain all that is going to
> end up as a
> mini-tutorial which is beyond the scope of the explanation of a column, so
> the existing reference to pg_class should be enough.



>
> Revised patch attached just mentioning partitioned tables.

>From a user's point of view, it makes sense to differentiate between
partitioning and inheritance, though internally the first uses the
later.

Maybe we could just generalize the sentence as "tableoid can be used
to obtain the table name either by joining against the oid column of
pg_class or casting it to regclass as appropriate." Or just ""tableoid
can be used to obtain the table name.". Probably the users would find
out how to do that from some other part of the document.

  tableoid can be joined against the
  oid column of
  pg_class to obtain the table name.

But even without that change, the current patch is useful. Please add
it to commitfest so it's not forgotten.

-- 
Best Wishes,
Ashutosh Bapat




Re: speed up unicode decomposition and recomposition

2020-10-20 Thread John Naylor
On Tue, Oct 20, 2020 at 3:22 AM Michael Paquier  wrote:

> On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote:
> > I don't see any difference on gcc/Linux in those two files, nor in
> > unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as
> > expected. Could it depend on the compiler?
>
> Hmm.  My guess is that you don't have --enable-debug in your set of
> configure options?  It is not unusual to have this one enabled for GCC
> even on production systems, and the size of the libs is impacted in
> this case with your patch.
>

I've confirmed that. How about a new header unicode_norm_hashfunc.h which
would include unicode_norm_table.h at the top. In unicode.c, we can include
one of these depending on frontend or backend.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Resetting spilled txn statistics in pg_stat_replication

2020-10-20 Thread Amit Kapila
On Mon, Oct 19, 2020 at 1:20 PM Masahiko Sawada
 wrote:
>
> On Mon, 19 Oct 2020 at 14:24, Amit Kapila  wrote:
> >
>
> > So, let's
> > stick with 'text' data type for slot_name which means we should
> > go-ahead with the v3 version of the patch [1] posted by Shinoda-San,
> > right?
>
> Right. +1
>

I have pushed the patch after updating the test_decoding/sql/stats.
The corresponding changes in the test were missing.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] SET search_path += octopus

2020-10-20 Thread Ibrar Ahmed
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Thanks for the patch. The patch works on my machine as per specs on the master 
branch (26ec6b5948a73d0e07ed9435ee4554594acdf34f).

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-20 Thread Ajin Cherian
On Fri, Oct 16, 2020 at 5:21 PM Peter Smith  wrote:
>
> Hello Ajin,
>
> The v9 patches provided support for two-phase transactions for NON-streaming.
>
> Now I have added STREAM support for two-phase transactions, and bumped
> all patches to version v10.
>
> (The 0001 and 0002 patches are unchanged. Only 0003 is changed).
>
> --
>
> There are a few TODO/FIXME comments in the code highlighting parts
> needing some attention.
>
> There is a #define DEBUG_STREAM_2PC useful for debugging, which I can
> remove later.
>
> All the patches have some whitespaces issues when applied. We can
> resolve them as we go.
>
> Please let me know any comments/feedback.

Hi Peter,

Thanks for your patch. Some comments for your patch:

Comments:

src/backend/replication/logical/worker.c
@@ -888,6 +888,319 @@ apply_handle_prepare(StringInfo s)
+ /*
+ * FIXME - Following condition was in apply_handle_prepare_txn except
I found  it was ALWAYS IsTransactionState() == false
+ * The synchronization worker runs in single transaction. *
+ if (IsTransactionState() && !am_tablesync_worker())
+ */
+ if (!am_tablesync_worker())

Comment: I dont think a tablesync worker will use streaming, none of
the other stream APIs check this, this might not be relevant for
stream_prepare either.


+ /*
+ * 
==
+ * The following chunk of code is largely cut/paste from the existing
apply_handle_prepare_commit_txn

Comment: Here, I think you meant apply_handle_stream_commit. Also
rather than duplicating this chunk of code, you could put it in a new
function.

+ /* open the spool file for the committed transaction */
+ changes_filename(path, MyLogicalRepWorker->subid, xid);

Comment: Here the comment should read "committed/prepared" rather than
"committed"


+ else
+ {
+ /* Process any invalidation messages that might have accumulated. */
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }

Comment: This else block might not be necessary as a tablesync worker
will not initiate the streaming APIs.

+ BeginTransactionBlock();
+ CommitTransactionCommand();
+ StartTransactionCommand();

Comment: Rereading the code and the transaction state description in
src/backend/access/transam/README. I am not entirely sure if the
BeginTransactionBlock followed by CommitTransactionBlock is really
needed here.
I understand this code was copied over from apply_handle_prepare_txn,
but now looking back I'm not so sure if it is correct. The transaction
would have already begin as part of applying the changes, why begin it
again?
Maybe Amit could confirm this.

END

regards,
Ajin Cherian
Fujitsu Australia




Error in pg_restore (could not close data file: Success)

2020-10-20 Thread Andrii Tkach
Hello,
After restoring 'directory' backup with pg_restore (PostgreSQL 10.6) I've
got a message:
pg_restore: [directory archiver] could not close data file: Success
pg_restore: [parallel archiver] a worker process died unexpectedly
In this thread:
https://www.postgresql.org/message-id/CAFcNs%2Bos5ExGvXMBrBBzzuJJamoHt5-zdJdxX39nkVG0KUxwsw%40mail.gmail.com
there is only one answer. I'm interesting, is it normal behaivor of
pg_restore and backup restored normaly or not ?

Regards, Andrii


Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Dave Cramer
On Tue, 20 Oct 2020 at 05:57, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2020-10-09 21:02, Dave Cramer wrote:
> > For the most part we know exactly which types we want in binary for 99%
> > of queries.
> >
> > The hard part around this really is whether and how to deal with
> changes
> > in type definitions. From types just being created - comparatively
> > simple - to extensions being dropped and recreated, with oids
> > potentially being reused.
> >
> >
> > Fair point but this is going to be much more complex than just sending
> > most of the results in binary which would speed up the overwhelming
> > majority of queries
>
> I've been studying in more detail how the JDBC driver handles binary
> format use.  Having some kind of message "use binary for these types"
> would match its requirements quite exactly.  (I have also studied
> npgsql, but it appears to work quite differently.  More input from there
> and other places with similar requirements would be welcome.)  The
> question as mentioned above is how to deal with type changes.  Let's
> work through a couple of options.
>

I've added Vladimir (pgjdbc), Shay (npgsql) and Mark Paluch (r2dbc)  to
this discussion.
I'm sure there are others but I'm not acquainted with them

>
> We could send the type/format list with every query.  For example, we
> could extend/enhance/alter the Bind message so that instead of a
> format-per-column it sends a format-per-type.  But then you'd need to
> send the complete type list every time.  The JDBC driver currently has
> 20+ types already hardcoded and more optionally, so you'd send 100+
> bytes for every query, plus required effort for encoding and decoding.
> That seems unattractive.
>
> Or we send the type/format list once near the beginning of the session.
> Then we need to deal with types being recreated or updated etc.
>
> The first option is that we "lock" the types against changes (ignoring
> whether that's actually possible right now).  That would mean you
> couldn't update an affected type/extension while a JDBC session is
> active.  That's no good.  (Imagine connection pools with hours of server
> lifetime.)
>
> Another option is that we invalidate the session when a thus-registered
> type changes.  Also no good.  (We don't want an extension upgrade
> suddenly breaking all open connections.)
>
> Agreed the first 2 options are not viable.


> Finally, we could do it an a best-effort basis.  We use binary format
> for registered types, until there is some invalidation event for the
> type, at which point we revert to default/text format until the end of a
> session (or until another protocol message arrives re-registering the
> type).


Does the driver tell the server what registered types it wants in binary ?


> This should work, because the result row descriptor contains the
> actual format type, and there is no guarantee that it's the same one
> that was requested.
>
> So how about that last option?  I imagine a new protocol message, say,
> TypeFormats, that contains a number of type/format pairs.  The message
> would typically be sent right after the first ReadyForQuery, gets no
> response.


This seems a bit hard to control. How long do you wait for no response?


> It could also be sent at any other time, but I expect that to
> be less used in practice.  Binary format is used for registered types if
> they have binary format support functions, otherwise text continues to
> be used.  There is no error response for types without binary support.
> (There should probably be an error response for registering a type that
> does not exist.)
>
> I'm not sure we (pgjdbc) want all types with binary support functions sent
automatically. Turns out that decoding binary is sometimes slower than
decoding the text and the on wire overhead isn't significant.
Timestamps/dates with timezone are also interesting as the binary output
does not include the timezone.

The notion of a status change message is appealing however. I used the term
status change on purpose as there are other server changes we would like to
be made aware of. For instance if someone changes the search path, we would
like to know. I'm sort of expanding the scope here but if we are imagining
... :)

Dave


Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Peter Eisentraut

On 2020-10-09 21:02, Dave Cramer wrote:
For the most part we know exactly which types we want in binary for 99% 
of queries.


The hard part around this really is whether and how to deal with changes
in type definitions. From types just being created - comparatively
simple - to extensions being dropped and recreated, with oids
potentially being reused.


Fair point but this is going to be much more complex than just sending 
most of the results in binary which would speed up the overwhelming 
majority of queries


I've been studying in more detail how the JDBC driver handles binary 
format use.  Having some kind of message "use binary for these types" 
would match its requirements quite exactly.  (I have also studied 
npgsql, but it appears to work quite differently.  More input from there 
and other places with similar requirements would be welcome.)  The 
question as mentioned above is how to deal with type changes.  Let's 
work through a couple of options.


We could send the type/format list with every query.  For example, we 
could extend/enhance/alter the Bind message so that instead of a 
format-per-column it sends a format-per-type.  But then you'd need to 
send the complete type list every time.  The JDBC driver currently has 
20+ types already hardcoded and more optionally, so you'd send 100+ 
bytes for every query, plus required effort for encoding and decoding. 
That seems unattractive.


Or we send the type/format list once near the beginning of the session. 
Then we need to deal with types being recreated or updated etc.


The first option is that we "lock" the types against changes (ignoring 
whether that's actually possible right now).  That would mean you 
couldn't update an affected type/extension while a JDBC session is 
active.  That's no good.  (Imagine connection pools with hours of server 
lifetime.)


Another option is that we invalidate the session when a thus-registered 
type changes.  Also no good.  (We don't want an extension upgrade 
suddenly breaking all open connections.)


Finally, we could do it an a best-effort basis.  We use binary format 
for registered types, until there is some invalidation event for the 
type, at which point we revert to default/text format until the end of a 
session (or until another protocol message arrives re-registering the 
type).  This should work, because the result row descriptor contains the 
actual format type, and there is no guarantee that it's the same one 
that was requested.


So how about that last option?  I imagine a new protocol message, say, 
TypeFormats, that contains a number of type/format pairs.  The message 
would typically be sent right after the first ReadyForQuery, gets no 
response.  It could also be sent at any other time, but I expect that to 
be less used in practice.  Binary format is used for registered types if 
they have binary format support functions, otherwise text continues to 
be used.  There is no error response for types without binary support. 
(There should probably be an error response for registering a type that 
does not exist.)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Parallel copy

2020-10-20 Thread Bharath Rupireddy
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila 
wrote:
> >
> > 2. Do we have tests for toast tables? I think if you implement the
> > previous point some existing tests might cover it but I feel we should
> > have at least one or two tests for the same.
> >
> Toast table use case 1: 1 tuples, 9.6GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
> (222.767, 0, 1X), (134.171, 1, 1.66X), (93.749, 2, 2.38X), (93.672, 4,
2.38X), (94.827, 8, 2.35X), (93.766, 16, 2.37X), (98.153, 20, 2.27X),
(122.721, 30, 1.81X)
>
> Toast table use case 2: 10 tuples, 96GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
> (2255.032, 0, 1X), (1358.628, 1, 1.66X), (901.170, 2, 2.5X), (912.743, 4,
2.47X), (988.718, 8, 2.28X), (938.000, 16, 2.4X), (997.556, 20, 2.26X),
(1000.586, 30, 2.25X)
>
> Toast table use case3: 1 tuples, 9.6GB, no indexes, binary file, each
row is > 1320KB:
> (136.983, 0, 1X), (136.418, 1, 1X), (81.896, 2, 1.66X), (62.929, 4,
2.16X), (52.311, 8, 2.6X), (40.032, 16, 3.49X), (44.097, 20, 3.09X),
(62.310, 30, 2.18X)
>
> In the case of a Toast table, we could achieve upto 2.5X for csv files,
and 3.5X for binary files. We are analyzing this point and will post an
update on our findings soon.
>

I analyzed the above point of getting only upto 2.5X performance
improvement for csv files with a toast table with 3 indexers - 2 on integer
columns and 1 on text column(not the toast column). Reason is that workers
are fast enough to do the work and they are waiting for the leader to fill
in the data blocks and in this case the leader is able to serve the workers
at its maximum possible speed. Hence most of the time the workers are
waiting not doing any beneficial work.

Having observed the above point, I tried to make workers perform more work
to avoid waiting time. For this, I added a gist index on the toasted text
column. The use and results are as follows.

Toast table use case4: 1 tuples, 9.6GB, 4 indexes - 2 on integer
columns, 1 on non-toasted text column and 1 gist index on toasted text
column, csv file, each row is  ~ 12.2KB:

(1322.839, 0, 1X), (1261.176, 1, 1.05X), (632.296, 2, 2.09X), (321.941, 4,
4.11X), (181.796, 8, 7.27X), *(105.750, 16, 12.51X)*, (107.099, 20,
12.35X), (123.262, 30, 10.73X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Re: speed up unicode normalization quick check

2020-10-20 Thread John Naylor
On Mon, Oct 19, 2020 at 9:49 PM Michael Paquier  wrote:

>
> The aligned numbers have the advantage to make the checks of the code
> generated easier, for the contents and the format produced.  So using
> a right padding as you are suggesting here rather than a new exception
> in .gitattributes sounds fine to me.  I simplified things a bit as the
> attached, getting rid of the last comma while on it.  Does that look
> fine to you?
>

This is cleaner, so I'm good with this.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Is Recovery actually paused?

2020-10-20 Thread Simon Riggs
On Tue, 20 Oct 2020 at 09:50, Dilip Kumar  wrote:

> > > Why would we want this? What problem are you trying to solve?
> >
> > The requirement is to know the last replayed WAL on the standby so
> > unless we can guarantee that the recovery is actually paused we can
> > never get the safe last_replay_lsn value.
> >
> > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as 
> > > you wish?
> >
> > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > API and the behavior is to know whether the recovery paused is
> > requested or not,  So I am not sure is it a good idea to change the
> > behavior of the existing API?
> >
>
> Attached is the POC patch to show what I have in mind.

If you don't like it, I doubt anyone else cares for the exact current
behavior either. Thanks for pointing those issues out.

It would make sense to alter pg_wal_replay_pause() so that it blocks
until paused.

I suggest you add the 3-value state as you suggest, but make
pg_is_wal_replay_paused() respond:
if paused, true
if requested, wait until paused, then return true
else false

That then solves your issues with a smoother interface.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Online verification of checksums

2020-10-20 Thread Michael Paquier
On Mon, Apr 06, 2020 at 04:45:44PM -0400, Tom Lane wrote:
> Actually, after thinking about that a bit more: why is there an LSN-based
> special condition at all?  It seems like it'd be far more useful to
> checksum everything, and on failure try to re-read and re-verify the page
> once or twice, so as to handle the corner case where we examine a page
> that's in process of being overwritten.

I was reviewing this area today, and that actually matches my
impression.  Why do we need a LSN-based check at all?  As said
upthread, that's of course weak with random data as we would miss most
of the real checksum failures, with odds getting better depending on
the current LSN of the cluster moving on.  However, it seems to me
that we would have an extra advantage in removing this check
all together: it would be possible to check for pages even if these
are more recent than the start LSN of the backup, and that could be a
lot of pages that could be checked on a large cluster.  So by keeping
this check we also delay the detection of real problems.  As things
stand, I'd like to think that it would be much more useful to remove
this check and to have one or two extra retries (the current code only
has one).  I don't like much the possibility of false positives for
such critical checks, but as we need to live with what has been
released, that looks like a good move for stable branches.
--
Michael


signature.asc
Description: PGP signature


select_common_typmod

2020-10-20 Thread Peter Eisentraut
While working on another patch, I figured adding a 
select_common_typmod() to go along with select_common_type() and 
select_common_collation() would be handy.  Typmods were previously 
combined using hand-coded logic in several places, and not at all in 
other places.  The logic in select_common_typmod() isn't very exciting, 
but it makes the code more compact and readable in a few locations, and 
in the future we can perhaps do more complicated things if desired.


There might have been a tiny bug in transformValuesClause() because 
while consolidating the typmods it does not take into account whether 
the types are actually the same (as more correctly done in 
transformSetOperationTree() and buildMergedJoinVar()).


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 8a5af28e7c41055cfbcedd9aa07cf6714b60c178 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 20 Oct 2020 09:28:48 +0200
Subject: [PATCH] Add select_common_typmod()

This accompanies select_common_type() and select_common_collation().
Typmods were previously combined using hand-coded logic in several
places.  The logic in select_common_typmod() isn't very exciting, but
it makes the code more compact and readable in a few locations, and in
the future we can perhaps do more complicated things if desired.
---
 src/backend/parser/analyze.c  | 26 +-
 src/backend/parser/parse_clause.c | 23 +--
 src/backend/parser/parse_coerce.c | 37 +++
 src/backend/parser/parse_func.c   |  8 +--
 src/include/parser/parse_coerce.h |  2 ++
 5 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c159fb2957..98a83db8b5 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1434,9 +1434,8 @@ transformValuesClause(ParseState *pstate, SelectStmt 
*stmt)
for (i = 0; i < sublist_length; i++)
{
Oid coltype;
-   int32   coltypmod = -1;
+   int32   coltypmod;
Oid colcoll;
-   boolfirst = true;
 
coltype = select_common_type(pstate, colexprs[i], "VALUES", 
NULL);
 
@@ -1446,19 +1445,9 @@ transformValuesClause(ParseState *pstate, SelectStmt 
*stmt)
 
col = coerce_to_common_type(pstate, col, coltype, 
"VALUES");
lfirst(lc) = (void *) col;
-   if (first)
-   {
-   coltypmod = exprTypmod(col);
-   first = false;
-   }
-   else
-   {
-   /* As soon as we see a non-matching typmod, 
fall back to -1 */
-   if (coltypmod >= 0 && coltypmod != 
exprTypmod(col))
-   coltypmod = -1;
-   }
}
 
+   coltypmod = select_common_typmod(pstate, colexprs[i], coltype);
colcoll = select_common_collation(pstate, colexprs[i], true);
 
coltypes = lappend_oid(coltypes, coltype);
@@ -2020,8 +2009,6 @@ transformSetOperationTree(ParseState *pstate, SelectStmt 
*stmt,
Node   *rcolnode = (Node *) rtle->expr;
Oid lcoltype = exprType(lcolnode);
Oid rcoltype = exprType(rcolnode);
-   int32   lcoltypmod = exprTypmod(lcolnode);
-   int32   rcoltypmod = exprTypmod(rcolnode);
Node   *bestexpr;
int bestlocation;
Oid rescoltype;
@@ -2034,11 +2021,6 @@ transformSetOperationTree(ParseState *pstate, SelectStmt 
*stmt,

context,

);
bestlocation = exprLocation(bestexpr);
-   /* if same type and same typmod, use typmod; else 
default */
-   if (lcoltype == rcoltype && lcoltypmod == rcoltypmod)
-   rescoltypmod = lcoltypmod;
-   else
-   rescoltypmod = -1;
 
/*
 * Verify the coercions are actually possible.  If not, 
we'd fail
@@ -2089,6 +2071,10 @@ transformSetOperationTree(ParseState *pstate, SelectStmt 
*stmt,
rtle->expr = (Expr *) rcolnode;
}
 
+   rescoltypmod = select_common_typmod(pstate,
+ 

RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread tsunakawa.ta...@fujitsu.com
From: Kyotaro Horiguchi 
> It seems to respond to a statement-cancel signal immediately while
> waiting for a coming byte.  However, seems to wait forever while
> waiting a space in send-buffer. (Is that mean the session will be
> stuck if it sends a large chunk of bytes while the network is down?)

What part makes you worried about that?  libpq's send processing?

I've just examined pgfdw_cancel_query(), too.  As below, it uses a hidden 30 
second timeout.  After all, postgres_fdw also relies on timeout already.

/*
 * If it takes too long to cancel the query and discard the result, assume
 * the connection is dead.
 */
endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), 3);


> After receiving a signal, it closes the problem connection. So the
> local session is usable after that but the fiailed remote sessions are
> closed and created another one at the next use.

I couldn't see that the problematic connection is closed when the cancellation 
fails... Am I looking at a wrong place?

/*
 * If connection is already unsalvageable, don't touch it
 * further.
 */
if (entry->changing_xact_state)
break;


Regards
Takayuki Tsunakawa





Re: Is Recovery actually paused?

2020-10-20 Thread Dilip Kumar
On Tue, Oct 20, 2020 at 1:22 PM Dilip Kumar  wrote:
>
> On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs  wrote:
> >
> > On Mon, 19 Oct 2020 at 15:11, Dilip Kumar  wrote:
> >
> > > We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> > > to know whether the WAL replay pause is requested
> > > (pg_is_wal_replay_paused).  But there is no way to know whether the
> > > recovery is actually paused or not.  Actually, the recovery process
> > > might process an extra WAL before pausing the recovery.  So does it
> > > make sense to provide a new interface to tell whether the recovery is
> > > actually paused or not?
> > >
> > > One solution could be that we convert the XLogCtlData->recoveryPause
> > > from bool to tri-state variable (0-> recovery not paused 1-> pause
> > > requested 2-> actually paused).
> > >
> > > Any opinion on this?
> >
> > Why would we want this? What problem are you trying to solve?
>
> The requirement is to know the last replayed WAL on the standby so
> unless we can guarantee that the recovery is actually paused we can
> never get the safe last_replay_lsn value.
>
> > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you 
> > wish?
>
> Maybe we can also do that but pg_is_wal_replay_paused is an existing
> API and the behavior is to know whether the recovery paused is
> requested or not,  So I am not sure is it a good idea to change the
> behavior of the existing API?
>

Attached is the POC patch to show what I have in mind.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 7aeb50bb94bdf031408a54b517d5288d687462a2 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Tue, 20 Oct 2020 14:10:14 +0530
Subject: [PATCH] New interface to know wal recovery actually paused

---
 src/backend/access/transam/xlog.c  | 42 +-
 src/backend/access/transam/xlogfuncs.c | 28 ---
 src/include/access/xlog.h  | 11 -
 src/include/catalog/pg_proc.dat|  4 
 4 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1..4c50eb9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -714,8 +714,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState		recoveryPause;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -5994,6 +5994,16 @@ recoveryStopsAfter(XLogReaderState *record)
 	return false;
 }
 
+static void
+CheckAndSetRecoveryPause(void)
+{
+	/* If recovery pause is requested then set it paused */
+	SpinLockAcquire(>info_lck);
+	if (XLogCtl->recoveryPause == RECOVERY_PAUSE_REQUESTED)
+		XLogCtl->recoveryPause = RECOVERY_PAUSED;
+	SpinLockRelease(>info_lck);
+}
+
 /*
  * Wait until shared recoveryPause flag is cleared.
  *
@@ -6025,12 +6035,20 @@ recoveryPausesHere(bool endOfRecovery)
 (errmsg("recovery has paused"),
  errhint("Execute pg_wal_replay_resume() to continue.")));
 
-	while (RecoveryIsPaused())
+	while (RecoveryPauseRequested())
 	{
+
 		HandleStartupProcInterrupts();
+
 		if (CheckForStandbyTrigger())
 			return;
 		pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
+
+		/*
+		 * While we are in the loop, user might resume and pause again so set
+		 * this everytime.
+		 */
+		CheckAndSetRecoveryPause();
 		pg_usleep(100L);	/* 1000 ms */
 		pgstat_report_wait_end();
 	}
@@ -6039,17 +6057,29 @@ recoveryPausesHere(bool endOfRecovery)
 bool
 RecoveryIsPaused(void)
 {
+	bool	recoveryPause;
+
+	SpinLockAcquire(>info_lck);
+	recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? true : false;
+	SpinLockRelease(>info_lck);
+
+	return recoveryPause;
+}
+
+bool
+RecoveryPauseRequested(void)
+{
 	bool		recoveryPause;
 
 	SpinLockAcquire(>info_lck);
-	recoveryPause = XLogCtl->recoveryPause;
+	recoveryPause = (XLogCtl->recoveryPause != RECOVERY_IN_PROGRESS) ? true : false;
 	SpinLockRelease(>info_lck);
 
 	return recoveryPause;
 }
 
 void
-SetRecoveryPause(bool recoveryPause)
+SetRecoveryPause(RecoveryPauseState recoveryPause)
 {
 	SpinLockAcquire(>info_lck);
 	XLogCtl->recoveryPause = recoveryPause;
@@ -7423,7 +7453,7 @@ StartupXLOG(void)
 		proc_exit(3);
 
 	case RECOVERY_TARGET_ACTION_PAUSE:
-		SetRecoveryPause(true);
+		SetRecoveryPause(RECOVERY_PAUSE_REQUESTED);
 		recoveryPausesHere(true);
 
 		/* drop into promote */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b..a01f146 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -538,7 +538,7 @@ pg_wal_replay_pause(PG_FUNCTION_ARGS)
  errhint("%s cannot be executed after promotion is triggered.",
 		 "pg_wal_replay_pause()")));
 

Re: Timing of relcache inval at parallel worker init

2020-10-20 Thread Kyotaro Horiguchi
At Sat, 17 Oct 2020 04:53:06 -0700, Noah Misch  wrote in 
> While reviewing what became commit fe4d022, I was surprised at the sequence of
> relfilenode values that RelationInitPhysicalAddr() computed for pg_class,
> during ParallelWorkerMain(), when running the last command of this recipe:
> 
>   begin;
>   cluster pg_class using pg_class_oid_index;
>   set force_parallel_mode = 'regress';
>   values (1);
> 
> There's $OLD_NODE (relfilenode in the committed relation map) and $NEW_NODE
> (relfilenode in this transaction's active_local_updates).  The worker performs
> RelationInitPhysicalAddr(pg_class) four times:
> 
> 1) $OLD_NODE in BackgroundWorkerInitializeConnectionByOid().
> 2) $OLD_NODE in RelationCacheInvalidate() directly.
> 3) $OLD_NODE in RelationReloadNailed(), indirectly via 
> RelationCacheInvalidate().
> 4) $NEW_NODE indirectly as part of the executor running the query.
> 
> I did expect $OLD_NODE in (1), since ParallelWorkerMain() calls
> BackgroundWorkerInitializeConnectionByOid() before
> StartParallelWorkerTransaction().  I expected $NEW_NODE in (2) and (3); that
> didn't happen, because ParallelWorkerMain() calls InvalidateSystemCaches()
> before RestoreRelationMap().  Let's move InvalidateSystemCaches() later.
> Invalidation should follow any worker initialization step that changes the
> results of relcache validation; otherwise, we'd need to ensure the
> InvalidateSystemCaches() will not validate any relcache entry.  Invalidation
> should precede any step that reads from a cache; otherwise, we'd need to redo
> that step after inval.  (Currently, no step reads from a cache.)  Many steps,
> e.g. AttachSerializableXact(), have no effect on relcache validation, so it's
> arbitrary whether they happen before or after inval.  I'm putting inval as

I agree to both the discussions.

> late as possible, because I think it's easier to confirm that a step doesn't
> read from a cache than to confirm that a step doesn't affect relcache
> validation.  An also-reasonable alternative would be to move inval and its
> prerequisites as early as possible.

The steps became moved before the invalidation by this patch seems to
be in the lower layer than snapshot, so it seems to be reasonable.

> For reasons described in the attached commit message, this doesn't have
> user-visible consequences today.  Innocent-looking relcache.c changes might
> upheave that, so I'm proposing this on robustness grounds.  No need to
> back-patch.

I'm not sure about the necessity but lower-to-upper initialization
order is neat. I agree about not back-patching.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread tsunakawa.ta...@fujitsu.com
From: Kyotaro Horiguchi 
> At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada
>  wrote in
> > I think it doesn't matter whether in FDW framework or not. The user
> > normally doesn't care which backend processes connecting to foreign
> > servers. They will attempt to cancel the query like always if they
> > realized that a backend gets stuck. There are surely plenty of users
> > who use query cancellation.
> 
> The most serious impact from inability of canceling a query on a
> certain session is that server-restart is required to end such a
> session.

OK, as I may be repeating, I didn't deny the need for cancellation.  Let''s 
organize the argument.

* FDW in general
My understanding is that the FDW feature does not stipulate anything about 
cancellation.  In fact, odbc_fdw was uncancelable.  What do we do about this?

* postgres_fdw
Fortunately, it is (should be?) cancelable whatever method we choose for 2PC.  
So no problem.
But is it really cancellable now?  What if the libpq call is waiting for 
response when the foreign server or network is down?

"Inability to cancel requires database server restart" feels a bit 
exaggerating, as libpq has tcp_keepalive* and tcp_user_timeout connection 
parameters, and even without setting them, TCP timeout works.


Regards
Takayuki Tsunakawa





Re: Is Recovery actually paused?

2020-10-20 Thread Dilip Kumar
On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs  wrote:
>
> On Mon, 19 Oct 2020 at 15:11, Dilip Kumar  wrote:
>
> > We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> > to know whether the WAL replay pause is requested
> > (pg_is_wal_replay_paused).  But there is no way to know whether the
> > recovery is actually paused or not.  Actually, the recovery process
> > might process an extra WAL before pausing the recovery.  So does it
> > make sense to provide a new interface to tell whether the recovery is
> > actually paused or not?
> >
> > One solution could be that we convert the XLogCtlData->recoveryPause
> > from bool to tri-state variable (0-> recovery not paused 1-> pause
> > requested 2-> actually paused).
> >
> > Any opinion on this?
>
> Why would we want this? What problem are you trying to solve?

The requirement is to know the last replayed WAL on the standby so
unless we can guarantee that the recovery is actually paused we can
never get the safe last_replay_lsn value.

> If we do care, why not fix pg_is_wal_replay_paused() so it responds as you 
> wish?

Maybe we can also do that but pg_is_wal_replay_paused is an existing
API and the behavior is to know whether the recovery paused is
requested or not,  So I am not sure is it a good idea to change the
behavior of the existing API?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2020-10-20 Thread Simon Riggs
On Mon, 19 Oct 2020 at 15:11, Dilip Kumar  wrote:

> We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> to know whether the WAL replay pause is requested
> (pg_is_wal_replay_paused).  But there is no way to know whether the
> recovery is actually paused or not.  Actually, the recovery process
> might process an extra WAL before pausing the recovery.  So does it
> make sense to provide a new interface to tell whether the recovery is
> actually paused or not?
>
> One solution could be that we convert the XLogCtlData->recoveryPause
> from bool to tri-state variable (0-> recovery not paused 1-> pause
> requested 2-> actually paused).
>
> Any opinion on this?

Why would we want this? What problem are you trying to solve?

If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Kyotaro Horiguchi
At Tue, 20 Oct 2020 04:23:12 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Kyotaro Horiguchi 
> > > Furthermore, FDW is not cancellable in general.  So, I don't see a point 
> > > in
> > trying hard to make only commit be cancelable.
> > 
> > I think that it is quite important that operators can cancel any
> > process that has been stuck for a long time. Furthermore, postgres_fdw
> > is more likely to be stuck since network is involved so the usefulness
> > of that feature would be higher.
> 
> But lower than practical performance during normal operation.
> 
> BTW, speaking of network, how can postgres_fdw respond quickly to cancel 
> request when libpq is waiting for a reply from a down foreign server?  Can 
> the user continue to use that session after cancellation?

It seems to respond to a statement-cancel signal immediately while
waiting for a coming byte.  However, seems to wait forever while
waiting a space in send-buffer. (Is that mean the session will be
stuck if it sends a large chunk of bytes while the network is down?)

After receiving a signal, it closes the problem connection. So the
local session is usable after that but the fiailed remote sessions are
closed and created another one at the next use.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Commitfest manager 2020-11

2020-10-20 Thread gkokolatos






‐‐‐ Original Message ‐‐‐
> [snip]
>
> Wow, that was well in advance) I am willing to assist if you need any help.
>

Indeed it was a bit early. I left for vacation after that. For the record, I am 
newly active to the community. In our PUG, in Stockholm, we held a meetup 
during which a contributor presented ways to contribute to the community, one 
of which is becoming CFM. So, I thought of picking up the recommendation.

I have taken little part in CFs as reviewer/author and I have no idea how a CF 
is actually run. A contributor from Stockholm has been willing to mentor me to 
the part.

Since you have both the knowledge and specific ideas on improving the CF, how 
about me assisting you? I could shadow you and you can groom me to the part, so 
that I can take the lead to a future CF more effectively.

This is just a suggestion of course. I am happy with anything that can help the 
community as a whole.





Re: speed up unicode decomposition and recomposition

2020-10-20 Thread Michael Paquier
On Mon, Oct 19, 2020 at 10:34:33AM -0400, John Naylor wrote:
> I don't see any difference on gcc/Linux in those two files, nor in
> unicode_norm_shlib.o -- I do see a difference in unicode_norm_srv.o as
> expected. Could it depend on the compiler?

Hmm.  My guess is that you don't have --enable-debug in your set of
configure options?  It is not unusual to have this one enabled for GCC
even on production systems, and the size of the libs is impacted in
this case with your patch.
--
Michael


signature.asc
Description: PGP signature


Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Kyotaro Horiguchi
At Tue, 20 Oct 2020 15:53:29 +0900, Masahiko Sawada 
 wrote in 
> On Tue, 20 Oct 2020 at 13:23, tsunakawa.ta...@fujitsu.com
>  wrote:
> >
> > From: Kyotaro Horiguchi 
> > > I don't think the inability to cancel all session at once cannot be a
> > > reason not to not to allow operators to cancel a stuck session.
> >
> > Yeah, I didn't mean to discount the ability to cancel queries.  I just want 
> > to confirm how the user can use the cancellation in practice.  I didn't see 
> > how the user can use the cancellation in the FDW framework, so I asked 
> > about it.  We have to think about the user's context if we regard canceling 
> > commits as important.
> >
> 
> I think it doesn't matter whether in FDW framework or not. The user
> normally doesn't care which backend processes connecting to foreign
> servers. They will attempt to cancel the query like always if they
> realized that a backend gets stuck. There are surely plenty of users
> who use query cancellation.

The most serious impact from inability of canceling a query on a
certain session is that server-restart is required to end such a
session.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Add statistics to pg_stat_wal view for wal related parameter tuning

2020-10-20 Thread Masahiro Ikeda

On 2020-10-20 12:46, Amit Kapila wrote:
On Tue, Oct 20, 2020 at 8:01 AM Masahiro Ikeda 
 wrote:


Hi,

I think we need to add some statistics to pg_stat_wal view.

Although there are some parameter related WAL,
there are few statistics for tuning them.

I think it's better to provide the following statistics.
Please let me know your comments.

```
postgres=# SELECT * from pg_stat_wal;
-[ RECORD 1 ]---+--
wal_records | 2000224
wal_fpi | 47
wal_bytes   | 248216337
wal_buffers_full| 20954
wal_init_file   | 8
wal_write_backend   | 20960
wal_write_walwriter | 46
wal_write_time  | 51
wal_sync_backend| 7
wal_sync_walwriter  | 8
wal_sync_time   | 0
stats_reset | 2020-10-20 11:04:51.307771+09
```

1. Basic statistics of WAL activity

- wal_records: Total number of WAL records generated
- wal_fpi: Total number of WAL full page images generated
- wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, 
auto_explain,

autovacuum and pg_stat_statements now,
if users want to see the performance trends  for the entire database,
they must recalculate the statistics.



Here, do you mean to say 'entire cluster' instead of 'entire database'
because it seems these stats are getting collected for the entire
cluster?


Thanks for your comments.
Yes, I wanted to say 'entire cluster'.


I think it is useful to add the sum of the basic statistics.



There is an argument that it is better to view these stats at the
statement-level so that one can know which statements are causing most
WAL and then try to rate-limit them if required in the application and
anyway they can get the aggregate of all the WAL if they want. We have
added these stats in PG-13, so do we have any evidence that the
already added stats don't provide enough information? I understand
that you are trying to display the accumulated stats here which if
required users/DBA need to compute with the currently provided stats.
OTOH, sometimes adding more ways to do some things causes difficulty
for users to understand and learn.


I agreed that the statement-level stat is important and I understood 
that we can
know the aggregated WAL stats of pg_stat_statement view and autovacuum's 
log.
But now, WAL stats generated by autovacuum can be output to logs and it 
is not
easy to aggregate them. Since WAL writes impacts for the entire cluster, 
I thought

it's natural to provide accumulated value.


I see that we also need to add extra code to capture these stats (some
of which is in performance-critical path especially in
XLogInsertRecord) which again makes me a bit uncomfortable. It might
be that it is all fine as it is very important to collect these stats
at cluster-level in spite that the same information can be gathered at
statement-level to help customers but I don't see a very strong case
for that in your proposal.


Also about performance, I thought there are few impacts because it
increments stats in memory. If I can implement to reuse pgWalUsage's
value which already collects these stats, there is no impact in 
XLogInsertRecord.

For example, how about pg_stat_wal() calculates the accumulated
value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's value?

Regards
--
Masahiro Ikeda
NTT DATA CORPORATION




RE: [Patch] Optimize dropping of relation buffers using dlist

2020-10-20 Thread tsunakawa.ta...@fujitsu.com
From: Jamison, Kirk/ジャミソン カーク 
> However, I still can't seem to find the cause of why the non-recovery
> performance does not change when compared to master. (1 min 15 s for the
> given test case below)

Can you check and/or try the following?


1. Isn't the vacuum cost delay working?
VACUUM command should run without sleeping with the default settings.  Just in 
case, can you try with the settings:

vacuum_cost_delay = 0
vacuum_cost_limit = 1


2. Buffer strategy
The non-recovery VACUUM can differ from that of recovery in the use of shared 
buffers.  The VACUUM command uses only 256 KB of shared buffers.  To make 
VACUUM command use the whole shared buffers, can you modify 
src/backend/commands/vacuum.c so that GetAccessStrategy()'s argument is changed 
to BAS_VACUUM to BAS_NORMAL?  (I don't have much hope about this, though, 
because all blocks of the relations are already cached in shared buffers when 
VACUUM is run.)


Can you measure the time DropRelFileNodeBuffers()?  You can call GetTimestamp() 
at the beginning and end of the function, and use TimestampDifference() to 
calculate the difference.  Then, for instance, elog(WARNING, "time is | %u.%u", 
sec, usec) at the end of the function.  You can use any elog() print format for 
your convenience to write shell commands to filter the lines and sum up the 
total.



Regards
Takayuki Tsunakawa





Re: Wired if-statement in gen_partprune_steps_internal

2020-10-20 Thread Andy Fan
On Wed, Oct 14, 2020 at 11:26 AM Andy Fan  wrote:

>
>
> On Mon, Oct 12, 2020 at 4:37 PM Amit Langote 
> wrote:
>
>> Hi,
>>
>> On Thu, Oct 8, 2020 at 6:56 PM Andy Fan  wrote:
>> >
>> > Hi:
>> >
>> >   I found the following code in gen_partprune_steps_internal,  which
>> > looks the if-statement to be always true since list_length(results) > 1;
>> > I added an Assert(step_ids != NIL) and all the test cases passed.
>> > if the if-statement is always true,  shall we remove it to avoid
>> confusion?
>> >
>> >
>> > gen_partprune_steps_internal(GeneratePruningStepsContext *context,
>> >
>> >
>> > if (list_length(result) > 1)
>> > {
>> > List   *step_ids = NIL;
>> >
>> > foreach(lc, result)
>> > {
>> > PartitionPruneStep *step = lfirst(lc);
>> >
>> > step_ids = lappend_int(step_ids, step->step_id);
>> > }
>> > Assert(step_ids != NIL);
>> > if (step_ids != NIL) // This should always be true.
>> > {
>> > PartitionPruneStep *step;
>> >
>> > step = gen_prune_step_combine(context, step_ids,
>> >
>>PARTPRUNE_COMBINE_INTERSECT);
>> > result = lappend(result, step);
>> > }
>> > }
>>
>> That seems fine to me.
>>
>> Looking at this piece of code, I remembered that exactly the same
>> piece of logic is also present in gen_prune_steps_from_opexps(), which
>> looks like this:
>>
>> /* Lastly, add a combine step to mutually AND these op steps, if
>> needed */
>> if (list_length(opsteps) > 1)
>> {
>> List   *opstep_ids = NIL;
>>
>> foreach(lc, opsteps)
>> {
>> PartitionPruneStep *step = lfirst(lc);
>>
>> opstep_ids = lappend_int(opstep_ids, step->step_id);
>> }
>>
>> if (opstep_ids != NIL)
>> return gen_prune_step_combine(context, opstep_ids,
>>   PARTPRUNE_COMBINE_INTERSECT);
>> return NULL;
>> }
>> else if (opsteps != NIL)
>> return linitial(opsteps);
>>
>> I think we should remove this duplicative logic and return the
>> generated steps in a list from this function, which the code in
>> gen_partprune_steps_internal() then "combines" using an INTERSECT
>> step.  See attached a patch to show what I mean.
>>
>>
> This changes LGTM, and "make check" PASSED,  thanks for the patch!
>
>
I created https://commitfest.postgresql.org/30/2771/ so that this patch
will not
be lost.  Thanks!

-- 
Best Regards
Andy Fan


Re: Transactions involving multiple postgres foreign servers, take 2

2020-10-20 Thread Masahiko Sawada
On Tue, 20 Oct 2020 at 13:23, tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Kyotaro Horiguchi 
> > I don't think the inability to cancel all session at once cannot be a
> > reason not to not to allow operators to cancel a stuck session.
>
> Yeah, I didn't mean to discount the ability to cancel queries.  I just want 
> to confirm how the user can use the cancellation in practice.  I didn't see 
> how the user can use the cancellation in the FDW framework, so I asked about 
> it.  We have to think about the user's context if we regard canceling commits 
> as important.
>

I think it doesn't matter whether in FDW framework or not. The user
normally doesn't care which backend processes connecting to foreign
servers. They will attempt to cancel the query like always if they
realized that a backend gets stuck. There are surely plenty of users
who use query cancellation.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [patch] Fix checksum verification in base backups for zero page headers

2020-10-20 Thread Michael Paquier
On Fri, Oct 16, 2020 at 06:02:52PM +0300, Anastasia Lubennikova wrote:
> In the current patch, PageIsVerifed called from pgbasebackup simply doesn't
> Should we change this too? I don't see any difference.

I considered that, but now that does not seem worth bothering here.

> Done.

Thanks for the updated patch.  I have played with dd and some fake
files to check the validity of the zero-case (as long as pd_lsn does
not go wild).  Here are some comments, with an updated patch attached:
- Added a PageIsVerifiedExtended to make this patch back-patchable,
with the original routine keeping its original shape.
- I did not like much to show the WARNING from PageIsVerified() for
the report, and we'd lose some context related to the base backups.
The important part is to know which blocks from which files are found
as problematic.
- Switched the checks to have PageIsVerified() placed first in the
hierarchy, so as we do all the basic validity checks before a look at
the LSN.  This is not really change things logically.
- The meaning of block_retry is also the opposite of what it should be
in the original code?  IMO, the flag should be set to true if we still
are fine to retry a block, and set it to false once the one-time retry
has failed.
- Moved the hardcoded failure report threshold of 5 into its own
variable.
- The error strings are not really consistent with the project style
in this area.  These are usually not spawned across multiple lines to
ease searches with grep or such.

Anastasia, Michael B, does that look OK to you?

NB: This patch fixes only one problem, the zero-page case, as it was
the intention of Michael B to split this part into its own thread.  We
still have, of course, a second problem when it comes to a random LSN
put into the page header which could mask an actual checksum failure
so this only takes care of half the issues.  Having a correct LSN
approximation is a tricky problem IMO, but we could improve things by
having some delta with an extra WAL page on top of GetInsertRecPtr().
And this function cannot be used for base backups taken from
standbys.
--
Michael
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 51b8f994ac..1a28ee1130 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -404,26 +404,33 @@ do { \
  *		extern declarations
  * 
  */
+
+/* flags for PageAddItemExtended() */
 #define PAI_OVERWRITE			(1 << 0)
 #define PAI_IS_HEAP(1 << 1)
 
+/* flags for PageIsVerifiedExtended() */
+#define PIV_LOG_WARNING			(1 << 0)
+#define PIV_REPORT_STAT			(1 << 1)
+
 #define PageAddItem(page, item, size, offsetNumber, overwrite, is_heap) \
 	PageAddItemExtended(page, item, size, offsetNumber, \
 		((overwrite) ? PAI_OVERWRITE : 0) | \
 		((is_heap) ? PAI_IS_HEAP : 0))
 
 /*
- * Check that BLCKSZ is a multiple of sizeof(size_t).  In PageIsVerified(),
- * it is much faster to check if a page is full of zeroes using the native
- * word size.  Note that this assertion is kept within a header to make
- * sure that StaticAssertDecl() works across various combinations of
- * platforms and compilers.
+ * Check that BLCKSZ is a multiple of sizeof(size_t).  In
+ * PageIsVerifiedExtended(), it is much faster to check if a page is
+ * full of zeroes using the native word size.  Note that this assertion
+ * is kept within a header to make sure that StaticAssertDecl() works
+ * across various combinations of platforms and compilers.
  */
 StaticAssertDecl(BLCKSZ == ((BLCKSZ / sizeof(size_t)) * sizeof(size_t)),
  "BLCKSZ has to be a multiple of sizeof(size_t)");
 
 extern void PageInit(Page page, Size pageSize, Size specialSize);
 extern bool PageIsVerified(Page page, BlockNumber blkno);
+extern bool PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags);
 extern OffsetNumber PageAddItemExtended(Page page, Item item, Size size,
 		OffsetNumber offsetNumber, int flags);
 extern Page PageGetTempPage(Page page);
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index dbbd3aa31f..d538f25726 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -443,7 +443,8 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst,
 
 		smgrread(src, forkNum, blkno, buf.data);
 
-		if (!PageIsVerified(page, blkno))
+		if (!PageIsVerifiedExtended(page, blkno,
+	PIV_LOG_WARNING | PIV_REPORT_STAT))
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
 	 errmsg("invalid page in block %u of relation %s",
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b89df01fa7..d1eb350b60 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1569,21 +1569,22 @@ sendFile(const char *readfilename, const char *tarfilename,
 {
 	int			fd;
 	BlockNumber blkno = 0;
-	bool		block_retry = false;
+	bool		block_retry = true;
 	char