date:20210910

On Wed, Sep 01, 2021 at 09:26:33AM -0500, Jaime Casanova wrote:
> On Wed, Sep 01, 2021 at 03:10:32PM +0200, Daniel Gustafsson wrote:
> > It is now 2021-09-01 Anywhere On Earth so I’ve set the September commitfest 
> > to
> > In Progress and opened the November one for new entries.  Jaime Casanova has
> > volunteered for CFM [0], so let’s help him close the 284 still open items in
> > the queue.
> > 
> 
> Thank you Daniel for editing the commitfest entries, that's something I
> cannot do.
> 
> And you're right, we have 284 patches in the queue (excluding committed, 
> returned with feedback, withdrawn and rejected)... 18 of them for more than
> 10 commitfests!
> 
> Needs review: 192. 
> Waiting on Author: 68. 
> Ready for Committer: 24
> 

Hi everyone,

On the first 10 days of this commitfest some numbers have moved, mostly
thanks to Daniel Gustafsson and good work from committers:

Needs review: 171. 
Waiting on Author: 79. 
Ready for Committer: 15.

How could we advance on the "needs review" queue? It's just too long!

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: strange case of "if ((a & b))"

2021-09-10 Thread Michael Paquier

On Thu, Sep 09, 2021 at 02:14:50PM +0900, Kyotaro Horiguchi wrote:
> Maybe I'm missing something, but I can see several instances of the
> "eval-bool ? true : false" pattern after fd0625c7a9 that are not in
> the latest 0002.

Yep.  There are more of these, and I have just looked at some of them
as of the patches proposed.  What was sent looked clean enough to
progress a bit and be done with it.
--
Michael


signature.asc
Description: PGP signature

Re: Teach pg_receivewal to use lz4 compression

2021-09-10 Thread Jian Guo

@@ -250,14 +302,18 @@ FindStreamingStart(uint32 *tli)
/*
 * Check that the segment has the right size, if it's supposed 
to be
 * completed.  For non-compressed segments just check the 
on-disk size
-* and see if it matches a completed segment. For compressed 
segments,
-* look at the last 4 bytes of the compressed file, which is 
where the
-* uncompressed size is located for gz files with a size lower 
than
-* 4GB, and then compare it to the size of a completed segment. 
The 4
-* last bytes correspond to the ISIZE member according to
+* and see if it matches a completed segment. For zlib 
compressed
+* segments, look at the last 4 bytes of the compressed file, 
which is
+* where the uncompressed size is located for gz files with a 
size lower
+* than 4GB, and then compare it to the size of a completed 
segment.
+* The 4 last bytes correspond to the ISIZE member according to
 * http://www.zlib.org/rfc-gzip.html.
+*
+* For lz4 compressed segments read the header using the 
exposed API and
+* compare the uncompressed file size, stored in
+* LZ4F_frameInfo_t{.contentSize}, to that of a completed 
segment.
 */
-   if (!ispartial && !iscompress)
+   if (!ispartial && wal_compression_method == COMPRESSION_NONE)
{
struct stat statbuf;
charfullpath[MAXPGPATH * 2];
@@ -276,7 +332,7 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
-   else if (!ispartial && iscompress)
+   else if (!ispartial && wal_compression_method == 
COMPRESSION_ZLIB)
{
int fd;
charbuf[4];
@@ -322,6 +378,70 @@ FindStreamingStart(uint32 *tli)
continue;
}
}
+   else if (!ispartial && compression_method == COMPRESSION_LZ4)
+   {
+#ifdef HAVE_LIBLZ4
+   int fd;
+   int r;
+   size_t  consumed_len = LZ4F_HEADER_SIZE_MAX;
+   charbuf[LZ4F_HEADER_SIZE_MAX];
+   charfullpath[MAXPGPATH * 2];
+   LZ4F_frameInfo_t frame_info = { 0 };
+   LZ4F_decompressionContext_t ctx = NULL;
+
+   snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, 
dirent->d_name);
+
+   fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-10 Thread Dilip Kumar

On Wed, Sep 8, 2021 at 9:54 PM Robert Haas  wrote:

> On Fri, Sep 3, 2021 at 5:54 PM Andres Freund  wrote:
> > > I think we already have such a code in multiple places where we bypass
> the
> > > shared buffers for copying the relation
> > > e.g. index_copy_data(), heapam_relation_copy_data().
> >
> > That's not at all comparable. We hold an exclusive lock on the relation
> at
> > that point, and we don't have a separate implementation of reading
> tuples from
> > the table or something like that.
>
> I don't think there's a way to do this that is perfectly clean, so the
> discussion here is really about finding the least unpleasant
> alternative. I *really* like the idea of using pg_class to figure out
> what relations to copy. As far as I'm concerned, pg_class is the
> canonical list of what's in the database, and to the extent that the
> filesystem happens to agree, that's good luck. From that perspective,
> using the filesystem to figure out what to copy is by definition a
> hack.
>

I agree with you, even though I think that scanning pg_class for
identifying the relfilenode looks like a more sensible thing to do than
directly scanning the file system, we need to consider one point that, now
also in current system  (in create database) we are scanning the directory
for copying the file so instead of copying them directly we need to
logically identify the relfilenode and then copy it block by block, so
maybe this approach will not make anyone unhappy because it is not any
worse than the current system.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)

2021-09-10 Thread Fujii Masao





On 2021/07/23 20:07, Ranier Vilela wrote:

Em sex., 23 de jul. de 2021 às 07:02, Aleksander Alekseev mailto:aleksan...@timescale.com>> escreveu:

Hi hackers,

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

The patch was tested on MacOS against master `80ba4bb3`.

The new status of this patch is: Ready for Committer


The second patch seems fine too. I'm attaching both patches to trigger 
cfbot and to double-check them.

Thanks Aleksander, for reviewing this.


I looked at these patches because they are marked as ready for committer.
They don't change any actual behavior, but look valid to me in term of coding.
Barring any objection, I will commit them.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: SQL:2011 application time

On Mon, Sep 06, 2021 at 12:52:37PM -0700, Paul A Jungwirth wrote:
> On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova
>  wrote:
> >
> > patch 01: does apply but doesn't compile, attached the compile errors.
> > patch 04: does not apply clean.
> 
> Thanks for taking a look! I've rebased & made it compile again. v7 attached.
> 

patch 01: does apply but gives a compile warning (which is fixed by patch
02)
"""
parse_utilcmd.c: In function ‘generateClonedIndexStmt’:
parse_utilcmd.c:1730:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
  Period *p = makeNode(Period);
  ^~
"""

patch 03: produces these compile errors.  

analyze.c: In function ‘transformForPortionOfBound’:
analyze.c:1171:3: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
   A_Const*n2 = makeNode(A_Const);
   ^~~
analyze.c:1172:10: error: ‘union ValUnion’ has no member named ‘type’
   n2->val.type = T_Null;
  ^
analyze.c:1172:18: error: ‘T_Null’ undeclared (first use in this function)
   n2->val.type = T_Null;
  ^~
analyze.c:1172:18: note: each undeclared identifier is reported only once for 
each function it appears in



-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: Estimating HugePages Requirements?

2021-09-10 Thread Bossart, Nathan

On 9/10/21, 1:02 PM, "Robert Haas"  wrote:
> On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan  wrote:
>> I think it might be clearer to
>> somehow indicate that the value is essentially the size of the main
>> shared memory area in terms of the huge page size, but I'm not sure
>> how to do that concisely.  Perhaps it is enough to just make sure the
>> description of "huge_pages_required" is detailed enough.
>
> shared_memory_size_in_huge_pages? It's kinda long, but a long name
> that you can understand without reading the docs is better than a
> short one where you can't.

I think that's an improvement.  The only other idea I have at the
moment is num_huge_pages_required_for_shared_memory.

Nathan

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-09-10 Thread Melanie Plageman

On Wed, Sep 8, 2021 at 9:28 PM Melanie Plageman
 wrote:
>
> On Fri, Aug 13, 2021 at 3:08 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2021-08-11 16:11:34 -0400, Melanie Plageman wrote:
> > > On Tue, Aug 3, 2021 at 2:13 PM Andres Freund  wrote:
> > > > > Also, I'm unsure how writing the buffer action stats out in
> > > > > pgstat_write_statsfiles() will work, since I think that backends can
> > > > > update their buffer action stats after we would have already persisted
> > > > > the data from the BufferActionStatsArray -- causing us to lose those
> > > > > updates.
> > > >
> > > > I was thinking it'd work differently. Whenever a connection ends, it 
> > > > reports
> > > > its data up to pgstats.c (otherwise we'd loose those stats). By the time
> > > > shutdown happens, they all need to have already have reported their 
> > > > stats - so
> > > > we don't need to do anything to get the data to pgstats.c during 
> > > > shutdown
> > > > time.
> > > >
> > >
> > > When you say "whenever a connection ends", what part of the code are you
> > > referring to specifically?
> >
> > pgstat_beshutdown_hook()
> >
> >
> > > Also, when you say "shutdown", do you mean a backend shutting down or
> > > all backends shutting down (including postmaster) -- like pg_ctl stop?
> >
> > Admittedly our language is very imprecise around this :(. What I meant
> > is that backends would report their own stats up to the stats collector
> > when the connection ends (in pgstat_beshutdown_hook()). That means that
> > when the whole server (pgstat and then postmaster, potentially via
> > pg_ctl stop) shuts down, all the per-connection stats have already been
> > reported up to pgstat.
> >
>
> So, I realized that the patch has a problem. I added the code to send
> buffer actions stats to the stats collector
> (pgstat_send_buffer_actions()) to pgstat_report_stat() and this isn't
> getting called when all types of backends exit.
>
> I originally thought to add pgstat_send_buffer_actions() to
> pgstat_beshutdown_hook() (as suggested), but, this is called after
> pgstat_shutdown_hook(), so, we aren't able to send stats to the stats
> collector at that time. (pgstat_shutdown_hook() sets pgstat_is_shutdown
> to true and then in pgstat_beshutdown_hook() (called after), if we call
> pgstat_send_buffer_actions(), it calls pgstat_send() which calls
> pgstat_assert_is_up() which trips when pgstat_is_shutdown is true.)
>
> After calling pgstat_send_buffer_actions() from pgstat_report_stat(), it
> seems to miss checkpointer stats entirely. I did find that if I
> sprinkled pgstat_send_buffer_actions() around in the various places that
> pgstat_send_checkpointer() is called, I could get checkpointer stats
> (see attached patch, capture_checkpointer_buffer_actions.patch), but,
> that seems a little bit haphazard since pgstat_send_buffer_actions() is
> supposed to capture stats for all backend types. Is there somewhere else
> I can call it that is exercised by all backend types before
> pgstat_shutdown_hook() is called but after they would have finished any
> relevant buffer actions?
>

I realized that putting these additional calls in checkpointer code and
not clearing out PgBackendStatus counters for buffer actions results in
a lot of duplicate stats. I was wondering if
pgstat_send_buffer_actions() is needed, however, in
HandleCheckpointerInterrupts() before the proc_exit().

It does seem like additional calls to pgstat_send_buffer_actions()
shouldn't be needed since most processes register
pgstat_shutdown_hook(). However, since MyDatabaseId isn't valid for the
auxiliary processes, even though the pgstat_shutdown_hook() is
registered from BaseInit(), pgstat_report_stat() never gets called for
them, so their stats aren't persisted using the current method.

It seems like the best solution to persisting all processes' stats would
be to have all processes register pgstat_shutdown_hook() and to still
call pgstat_report_stat() even if MyDatabaseId is not valid if the
process is not a regular backend (I assume that it is only a problem
that MyDatabaseId is InvalidOid for backends that have had it set to a
valid oid at some point). For the stats that rely on database OID,
perhaps those can be reported based on whether or not MyDatabaseId is
valid from within pgstat_report_stat().

I also realized that I am not collecting stats from live auxiliary
processes in pg_stat_get_buffer_actions(). I need to change the loop to
for (i = 0; i <= MaxBackends + NUM_AUXPROCTYPES; i++) to actually get
stats from live auxiliary processes when querying the view.

On an unrelated note, I am planning to remove buffers_clean and
buffers_checkpoint from the pg_stat_bgwriter view since those are also
redundant. When I was removing them, I noticed that buffers_checkpoint
and buffers_clean count buffers as having been written even when
FlushBuffer() "does nothing" because someone else wrote out the dirty
buffer before the bgwriter or checkpointer had a chance to do it. This
seems

Re: Remove_temp_files_after_crash and significant recovery/startup time

2021-09-10 Thread Euler Taveira

On Fri, Sep 10, 2021, at 5:58 PM, McCoy, Shawn wrote:
> I noticed that the new parameter remove_temp_files_after_crash is currently 
> set to a default value of "true" in the version 14 release. It seems this was 
> discussed in this thread [1], and it doesn't look to me like there's been a 
> lot of stress testing of this feature.
>  
> In our fleet there have been cases where we have seen hundreds of thousands 
> of temp files generated.  I found a case where we helped a customer that had 
> a little over 2.2 million temp files.  Single threaded cleanup of these takes 
> a significant amount of time and delays recovery. In RDS, we mitigated this 
> by moving the pgsql_tmp directory aside, start the engine and then separately 
> remove the old temp files.
2.2 million temporary files? I'm wondering in what circumstances your system is
generating those temporary files. Low work_mem and thousands of connections?
Low work_mem and a huge analytic query? When I designed this feature I thought
about some extreme cases, that's why this behavior is controlled by a GUC. We
can probably add a sentence that explains the recovery delay caused by dozens
of thousands of temporary files.

> 
> After noticing the current plans to default this GUC to "on" in v14, just 
> thought I'd raise the question of whether this should get a little more 
> discussion or testing with higher numbers of temp files?
>  
Crash a backend is per se a rare condition (at least it should be). Crash while
having millions of temporary files in your PGDATA is an even rarer condition. I
saw several cases related to this issue and none of them generates millions of
temporary files (at most a thousand files). IMO the benefits  outweigh the
issues as I explained in [1]. Service continuity (for the vast majority of
cases) justifies turning it on by default.

If your Postgres instance is generating millions of temporary files, it seems
your setup needs some tuning.


[1] 
https://www.postgresql.org/message-id/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
 

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Remove_temp_files_after_crash and significant recovery/startup time

2021-09-10 Thread Tomas Vondra


On 9/10/21 10:58 PM, McCoy, Shawn wrote:
I noticed that the new parameter remove_temp_files_after_crash is 
currently set to a default value of "true" in the version 14 release. It 
seems this was discussed in this thread [1], and it doesn't look to me 
like there's been a lot of stress testing of this feature.




Not sure what could we learn from a stress test? IMHO it's fairly 
natural that if there are many temporary files and/or if deleting a file 
is expensive on a given filesystem, the cleanup may take time.


In our fleet there have been cases where we have seen hundreds of 
thousands of temp files generated.  I found a case where we helped a 
customer that had a little over 2.2 million temp files.  Single threaded 
cleanup of these takes a significant amount of time and delays recovery. 
In RDS, we mitigated this by moving the pgsql_tmp directory aside, start 
the engine and then separately remove the old temp files.


After noticing the current plans to default this GUC to "on" in v14, 
just thought I'd raise the question of whether this should get a little 
more discussion or testing with higher numbers of temp files?




I doubt we can lean anything new from such testing.

Of course, we can discuss the default for the GUC. I see it as a trade 
off between risk of running out of disk space and increased recovery 
time, and perhaps the decision to prioritize lower risk of running out 
of disk space was not the right one ...



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Remove_temp_files_after_crash and significant recovery/startup time

2021-09-10 Thread Tom Lane

"McCoy, Shawn"  writes:
> I noticed that the new parameter remove_temp_files_after_crash is currently 
> set to a default value of "true" in the version 14 release. It seems this was 
> discussed in this thread [1], and it doesn't look to me like there's been a 
> lot of stress testing of this feature.

Probably not ...

> In our fleet there have been cases where we have seen hundreds of thousands 
> of temp files generated.  I found a case where we helped a customer that had 
> a little over 2.2 million temp files.  Single threaded cleanup of these takes 
> a significant amount of time and delays recovery. In RDS, we mitigated this 
> by moving the pgsql_tmp directory aside, start the engine and then separately 
> remove the old temp files.

TBH, I think the thing to be asking questions about is how come you had so
many temp files in the first place.  Sounds like something is misadjusted
somewhere.

regards, tom lane

Re: ORDER BY pushdowns seem broken in postgres_fdw

2021-09-10 Thread David Zhang


On 2021-09-06 1:16 a.m., Ronan Dunklau wrote:

Le vendredi 3 septembre 2021, 22:54:25 CEST David Zhang a écrit :

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Applied the v6 patch to master branch and ran regression test for contrib,
the result was "All tests successful."

What kind of error did you get running make installcheck-world ? If it passed
the make check for contrib, I can't see why it would fail running make
installcheck-world.
Just to clarify, the error I encountered was not related with patch v6, 
it was related with other extensions.

In any case, I just checked and running make installcheck-world doesn't
produce any error.

Since HEAD had moved a bit since the last version, I rebased the patch,
resulting in the attached v7.

Best regards,

--
Ronan Dunklau

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: slab allocator performance issues

2021-09-10 Thread Tomas Vondra

Hi,

I've been investigating the regressions in some of the benchmark
results, together with the generation context benchmarks [1].

Turns out it's pretty difficult to benchmark this, because the results
strongly depend on what the backend did before. For example if I run
slab_bench_fifo with the "decreasing" test for 32kB blocks and 512B
chunks, I get this:

select * from slab_bench_fifo(100, 32768, 512, 100, 1, 5000);

mem_allocated | alloc_ms | free_ms
---+--+-
528547840 | 155394 | 87440

i.e. palloc() takes ~155ms and pfree() ~87ms (and these result are
stable, the numbers don't change much with more runs).

But if I run a set of "lifo" tests in the backend first, the results
look like this:

mem_allocated | alloc_ms | free_ms
---+--+-
528547840 |41728 | 71524
(1 row)

so the pallocs are suddenly about ~4x faster. Clearly, what the backend
did before may have pretty dramatic impact on results, even for simple
benchmarks like this.

Note: The benchmark was a single SQL script, running all the different
workloads in the same backend.

I did a fair amount of perf profiling, and the main difference between
the slow and fast runs seems to be this:

0 page-faults:u

0 minor-faults:u

0 major-faults:u

20,634,153 page-faults:u

20,634,153 minor-faults:u

0 major-faults:u

Attached is a more complete perf stat output, but the page faults seem
to be the main issue. My theory is that in the "fast" case, the past
backend activity puts the glibc memory management into a state that
prevents page faults in the benchmark.

But of course, this theory may be incomplete - for example it's not
clear why running the benchmark repeatedly would not "condition" the
backend the same way. But it doesn't - it's ~150ms even for repeated runs.

Secondly, I'm not sure this explains why some of the timings actually
got much slower with the 0003 patch, when the sequence of the steps is
still the same. Of course, it's possible 0003 changes the allocation
pattern a bit, interfering with glibc memory management.

This leads to a couple of interesting questions, I think:

1) I've only tested this on Linux, with glibc. I wonder how it'd behave
on other platforms, or with other allocators.

2) Which cases are more important? When the backend was warmed up, or
when each benchmark runs in a new backend? It seems the "new backend" is
something like a "worst case" leading to more page faults, so maybe
that's the thing to watch. OTOH it's unlikely to have a completely new
backend, so maybe not.

3) Can this teach us something about how to allocate stuff, to better
"prepare" the backend for future allocations? For example, it's a bit
strange that repeated runs of the same benchmark don't do the trick, for
some reason.

regards

[1]
https://www.postgresql.org/message-id/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Performance counter stats for process id '11829':

219,869,739,737 cycles:u

119,722,280,436 instructions:u#0.54 insn per cycle

#1.56 stalled cycles per
insn
7,784,566,858 cache-references:u

2,487,257,287 cache-misses:u# 31.951 % of all cache
refs
5,942,054,520 bus-cycles:u

0 page-faults:u

0 minor-faults:u

0 major-faults:u

187,181,661,719 stalled-cycles-frontend:u # 85.13% frontend cycles
idle
144,274,017,071 stalled-cycles-backend:u # 65.62% backend cycles
idle

60.000876248 seconds time elapsed

Performance counter stats for process id '11886':

145,093,090,692 cycles:u

74,986,543,212 instructions:u#0.52 insn per cycle

#2.63 stalled cycles per
insn
4,753,764,781 cache-references:u

1,342,653,549 cache-misses:u# 28.244 % of all cache
refs
3,925,175,515 bus-cycles:u

20,634,153 page-faults:u

20,634,153 minor-faults:u

Re: Estimating HugePages Requirements?

On Thu, Sep 9, 2021 at 5:53 PM Bossart, Nathan  wrote:
> For 0001, the biggest thing on my mind at the moment is the name of
> the GUC.  "huge_pages_required" feels kind of ambiguous.  From the
> name alone, it could mean either "the number of huge pages required"
> or "huge pages are required for the server to run." Also, the number
> of huge pages required is not actually required if you don't want to
> run the server with huge pages.

+1 to all of that.

> I think it might be clearer to
> somehow indicate that the value is essentially the size of the main
> shared memory area in terms of the huge page size, but I'm not sure
> how to do that concisely.  Perhaps it is enough to just make sure the
> description of "huge_pages_required" is detailed enough.

shared_memory_size_in_huge_pages? It's kinda long, but a long name
that you can understand without reading the docs is better than a
short one where you can't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: extensible options syntax for replication parser?

On Thu, Oct 8, 2020 at 10:33 AM Robert Haas  wrote:
> That patch set includes this patch, and the reason for the behavior
> difference turned out to be that I had gotten an if-test that is part
> of this patch backwards. Here is v3, fixing that. It is a little
> disappointing that this mistake didn't cause any existing regression
> tests to fail.

I'm returning to this topic ~11 months later with a more definite
intent to get something committed, since my "refactoring basebackup.c"
patch set - that also adds server-side compression and server-side
backup - needs to add more options to BASE_BACKUP, and doubling down
on the present options-parsing strategy seems like a horrible idea.
I've now split this into two patches, one for BASE_BACKUP, and the
other for CREATE_REPLICATION_SLOT. I've rebased the patches and added
documentation as well. The CREATE_REPLICATION_SLOT patch now unifies
EXPORT_SNAPSHOT, NOEXPORT_SNAPSHOT, and USE_SNAPSHOT, which are
mutually exclusive choices, into SNAPSHOT { 'export' | 'use' |
'nothing' } which IMHO is clearer.

Last call for complaints about either the overall direction or the
specific implementation choices...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v5-0001-Flexible-options-for-BASE_BACKUP.patch
Description: Binary data

v5-0002-Flexible-options-for-CREATE_REPLICATION_SLOT.patch
Description: Binary data

Re: parallelizing the archiver




> 10 сент. 2021 г., в 22:18, Bossart, Nathan  написал(а):
> 
> I was thinking that archive_batch_size would be the maximum batch
> size.  If the archiver only finds a single file to archive, that's all
> it'd send to the archive command.  If it finds more, it'd send up to
> archive_batch_size to the command.

I think that a concept of a "batch" is misleading.
If you pass filenames via stdin you don't need to know all names upfront.
Just send more names to the pipe if achiver_command is still running one more 
segments just became available.
This way level of parallelism will adapt to the workload.

Best regards, Andrey Borodin.

Re: WIP: System Versioned Temporal Table

On Tue, Aug 10, 2021 at 01:20:14PM +0100, Simon Riggs wrote:
> On Wed, 14 Jul 2021 at 12:48, vignesh C  wrote:
> 
> > The patch does not apply on Head anymore, could you rebase and post a
> > patch. I'm changing the status to "Waiting for Author".
> 
> OK, so I've rebased the patch against current master to take it to v15.
> 
> I've then worked on the patch some myself to make v16 (attached),
> adding these things:
> 

Hi Simon,

This one doesn't apply nor compile anymore.
Can we expect a rebase soon?


-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: incorrect file name in backend_progress.c header

On Fri, Sep 10, 2021 at 11:20 AM Justin Pryzby  wrote:

> > For the first list, do you want to include the path to the file for
> > IDENTIFICATION ?
> > If so, I can prepare a patch covering the files in that list.
>
> Since there's so few exceptions to the "rule", I think they should be
> fixed for
> consistency.
>
> Here's an awk which finds a few more - including the one in your original
> report.
>
> $ find src -name '*.[ch]' -type f -print0 |xargs -r0 awk
> '{fn=gensub(".*/", "", "1", FILENAME)} FILENAME~/scripts/{fn=gensub("\\.c",
> "", 1, fn)} FNR==1 && /---$/{top=1} /\*\//{top=0} !top{next} FNR>1 && FNR<4
> && NF==2 && $2!=fn{print FILENAME,"head",fn,$2} /IDENTIFICATION/{getline;
> if ($0!~FILENAME){print FILENAME,"foot",$2}}'
>
> src/include/utils/dynahash.h head dynahash.h dynahash
> src/include/replication/pgoutput.h foot pgoutput.h
> src/backend/catalog/pg_publication.c foot pg_publication.c
> src/backend/utils/activity/wait_event.c foot
> src/backend/postmaster/wait_event.c
> src/backend/utils/activity/backend_status.c foot
> src/backend/postmaster/backend_status.c
> src/backend/utils/activity/backend_progress.c head backend_progress.c
> progress.c
> src/backend/utils/adt/version.c foot
> src/backend/replication/logical/reorderbuffer.c foot
> src/backend/replication/reorderbuffer.c
> src/backend/replication/logical/snapbuild.c foot
> src/backend/replication/snapbuild.c
> src/backend/replication/logical/logicalfuncs.c foot
> src/backend/replication/logicalfuncs.c
> src/backend/optimizer/util/inherit.c foot
> src/backend/optimizer/path/inherit.c
> src/backend/optimizer/util/appendinfo.c foot
> src/backend/optimizer/path/appendinfo.c
> src/backend/commands/publicationcmds.c foot publicationcmds.c
> src/backend/commands/subscriptioncmds.c foot subscriptioncmds.c
> src/interfaces/libpq/fe-misc.c head fe-misc.c FILE
> src/bin/scripts/common.c head common common.c
> src/port/pgcheckdir.c head pgcheckdir.c src/port/pgcheckdir.c
>
> --
> Justin
>

Here is updated patch covering second batch of files.


header-ident.patch
Description: Binary data

Re: incorrect file name in backend_progress.c header

2021-09-10 Thread Justin Pryzby

> For the first list, do you want to include the path to the file for
> IDENTIFICATION ?
> If so, I can prepare a patch covering the files in that list.

Since there's so few exceptions to the "rule", I think they should be fixed for
consistency.

Here's an awk which finds a few more - including the one in your original
report.

$ find src -name '*.[ch]' -type f -print0 |xargs -r0 awk '{fn=gensub(".*/", "", 
"1", FILENAME)} FILENAME~/scripts/{fn=gensub("\\.c", "", 1, fn)} FNR==1 && 
/---$/{top=1} /\*\//{top=0} !top{next} FNR>1 && FNR<4 && NF==2 && $2!=fn{print 
FILENAME,"head",fn,$2} /IDENTIFICATION/{getline; if ($0!~FILENAME){print 
FILENAME,"foot",$2}}'

src/include/utils/dynahash.h head dynahash.h dynahash
src/include/replication/pgoutput.h foot pgoutput.h
src/backend/catalog/pg_publication.c foot pg_publication.c
src/backend/utils/activity/wait_event.c foot src/backend/postmaster/wait_event.c
src/backend/utils/activity/backend_status.c foot 
src/backend/postmaster/backend_status.c
src/backend/utils/activity/backend_progress.c head backend_progress.c progress.c
src/backend/utils/adt/version.c foot 
src/backend/replication/logical/reorderbuffer.c foot 
src/backend/replication/reorderbuffer.c
src/backend/replication/logical/snapbuild.c foot 
src/backend/replication/snapbuild.c
src/backend/replication/logical/logicalfuncs.c foot 
src/backend/replication/logicalfuncs.c
src/backend/optimizer/util/inherit.c foot src/backend/optimizer/path/inherit.c
src/backend/optimizer/util/appendinfo.c foot 
src/backend/optimizer/path/appendinfo.c
src/backend/commands/publicationcmds.c foot publicationcmds.c
src/backend/commands/subscriptioncmds.c foot subscriptioncmds.c
src/interfaces/libpq/fe-misc.c head fe-misc.c FILE
src/bin/scripts/common.c head common common.c
src/port/pgcheckdir.c head pgcheckdir.c src/port/pgcheckdir.c

-- 
Justin

Re: incorrect file name in backend_progress.c header

On Fri, Sep 10, 2021 at 11:07 AM Zhihong Yu  wrote:

>
>
> On Fri, Sep 10, 2021 at 10:56 AM Justin Pryzby 
> wrote:
>
>> On Fri, Sep 10, 2021 at 10:11:34AM -0700, Zhihong Yu wrote:
>> > Hi,
>> > I was looking at backend_progress.c and noticed that the filename and
>> path
>> > were wrong in the header.
>> >
>> > Here is patch which corrects the mistake.
>>
>> For the record, I don't really like boilerplate, but fixing the
>> boilerplates
>> all at once is at least better than fixing them one at a time.
>>
>> Would you want to prepare a patch handling all of these ?
>>
>> $ find src/ -name '*.c' -type f -print0 |xargs -r0 awk '{fn=gensub(".*/",
>> "", "1", FILENAME)} FILENAME~/scripts/{fn=gensub("\\.c", "", 1, fn)} FNR==1
>> && /---/{top=1} /\*\//{top=0} !top{next} FNR==3 && NF==2 && $2!=fn{print
>> FILENAME,"head",fn,$2} /IDENTIFICATION/{getline; if ($0!~FILENAME){print
>> FILENAME,"foot",$2}}'
>> src/backend/catalog/pg_publication.c foot pg_publication.c
>> src/backend/utils/activity/wait_event.c foot
>> src/backend/postmaster/wait_event.c
>> src/backend/utils/activity/backend_status.c foot
>> src/backend/postmaster/backend_status.c
>> src/backend/utils/adt/version.c foot
>> src/backend/replication/logical/reorderbuffer.c foot
>> src/backend/replication/reorderbuffer.c
>> src/backend/replication/logical/snapbuild.c foot
>> src/backend/replication/snapbuild.c
>> src/backend/replication/logical/logicalfuncs.c foot
>> src/backend/replication/logicalfuncs.c
>> src/backend/optimizer/util/inherit.c foot
>> src/backend/optimizer/path/inherit.c
>> src/backend/optimizer/util/appendinfo.c foot
>> src/backend/optimizer/path/appendinfo.c
>> src/backend/commands/publicationcmds.c foot publicationcmds.c
>> src/backend/commands/subscriptioncmds.c foot subscriptioncmds.c
>> src/interfaces/libpq/fe-misc.c head fe-misc.c FILE
>> src/bin/scripts/common.c head common common.c
>> src/port/pgcheckdir.c head pgcheckdir.c src/port/pgcheckdir.c
>>
>> There's some function comments wrong too.
>> In case someone want to fix them together.
>>
>> commit 4fba6c5044da43c1fa263125e422e869ae449ae7
>> Author: Justin Pryzby 
>> Date:   Sun Sep 5 18:14:39 2021 -0500
>>
>> Wrong function name in header comment
>>
>> Found like this
>>
>> for f in `find src -type f -name '*.c'`; do awk -v p=0 '/^\/\*
>> *-*$/{h=$0; getline; h=h"\n"$0; g=gensub( "[^_[:alnum:]].*", "", 1, $2);
>> p=1} 0&&/^{/{p=0; print h}; /^ \*\/$/{h=h"\n"$0; getline a; h=h"\n"a;
>> getline f; h=h"\n"f; l=length(g); if (substr(f,1,7) == substr(g,1,7) &&
>> substr(f,1,l) != substr(g,1,l)) print FILENAME,g,f,"\n"h; next} 0&&/^[^s/
>> {]/{p=0; h=""; next} 0&{h=h"\n"$0}' "$f"; done |less
>>
>> diff --git a/src/backend/optimizer/util/pathnode.c
>> b/src/backend/optimizer/util/pathnode.c
>> index cedb3848dd..e53d381e19 100644
>> --- a/src/backend/optimizer/util/pathnode.c
>> +++ b/src/backend/optimizer/util/pathnode.c
>> @@ -105,7 +105,7 @@ compare_path_costs(Path *path1, Path *path2,
>> CostSelector criterion)
>>  }
>>
>>  /*
>> - * compare_path_fractional_costs
>> + * compare_fractional_path_costs
>>   *   Return -1, 0, or +1 according as path1 is cheaper, the same
>> cost,
>>   *   or more expensive than path2 for fetching the specified fraction
>>   *   of the total tuples.
>> diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
>> index a30a2c2eb8..72e6a7ea61 100644
>> --- a/src/common/pg_lzcompress.c
>> +++ b/src/common/pg_lzcompress.c
>> @@ -825,7 +825,7 @@ pglz_decompress(const char *source, int32 slen, char
>> *dest,
>>
>>
>>  /* --
>> - * pglz_max_compressed_size -
>> + * pglz_maximum_compressed_size -
>>   *
>>   * Calculate the maximum compressed size for a given amount
>> of raw data.
>>   * Return the maximum size, or total compressed size if
>> maximum size is
>>
>
> Hi,
> For the first list, do yu want to include the path to the file for
> IDENTIFICATION ?
> If so, I can prepare a patch covering the files in that list.
>
> Cheers
>

Here is updated patch covering the first list.


header-ident.patch
Description: Binary data

Re: incorrect file name in backend_progress.c header

On Fri, Sep 10, 2021 at 10:56 AM Justin Pryzby  wrote:

> On Fri, Sep 10, 2021 at 10:11:34AM -0700, Zhihong Yu wrote:
> > Hi,
> > I was looking at backend_progress.c and noticed that the filename and
> path
> > were wrong in the header.
> >
> > Here is patch which corrects the mistake.
>
> For the record, I don't really like boilerplate, but fixing the
> boilerplates
> all at once is at least better than fixing them one at a time.
>
> Would you want to prepare a patch handling all of these ?
>
> $ find src/ -name '*.c' -type f -print0 |xargs -r0 awk '{fn=gensub(".*/",
> "", "1", FILENAME)} FILENAME~/scripts/{fn=gensub("\\.c", "", 1, fn)} FNR==1
> && /---/{top=1} /\*\//{top=0} !top{next} FNR==3 && NF==2 && $2!=fn{print
> FILENAME,"head",fn,$2} /IDENTIFICATION/{getline; if ($0!~FILENAME){print
> FILENAME,"foot",$2}}'
> src/backend/catalog/pg_publication.c foot pg_publication.c
> src/backend/utils/activity/wait_event.c foot
> src/backend/postmaster/wait_event.c
> src/backend/utils/activity/backend_status.c foot
> src/backend/postmaster/backend_status.c
> src/backend/utils/adt/version.c foot
> src/backend/replication/logical/reorderbuffer.c foot
> src/backend/replication/reorderbuffer.c
> src/backend/replication/logical/snapbuild.c foot
> src/backend/replication/snapbuild.c
> src/backend/replication/logical/logicalfuncs.c foot
> src/backend/replication/logicalfuncs.c
> src/backend/optimizer/util/inherit.c foot
> src/backend/optimizer/path/inherit.c
> src/backend/optimizer/util/appendinfo.c foot
> src/backend/optimizer/path/appendinfo.c
> src/backend/commands/publicationcmds.c foot publicationcmds.c
> src/backend/commands/subscriptioncmds.c foot subscriptioncmds.c
> src/interfaces/libpq/fe-misc.c head fe-misc.c FILE
> src/bin/scripts/common.c head common common.c
> src/port/pgcheckdir.c head pgcheckdir.c src/port/pgcheckdir.c
>
> There's some function comments wrong too.
> In case someone want to fix them together.
>
> commit 4fba6c5044da43c1fa263125e422e869ae449ae7
> Author: Justin Pryzby 
> Date:   Sun Sep 5 18:14:39 2021 -0500
>
> Wrong function name in header comment
>
> Found like this
>
> for f in `find src -type f -name '*.c'`; do awk -v p=0 '/^\/\*
> *-*$/{h=$0; getline; h=h"\n"$0; g=gensub( "[^_[:alnum:]].*", "", 1, $2);
> p=1} 0&&/^{/{p=0; print h}; /^ \*\/$/{h=h"\n"$0; getline a; h=h"\n"a;
> getline f; h=h"\n"f; l=length(g); if (substr(f,1,7) == substr(g,1,7) &&
> substr(f,1,l) != substr(g,1,l)) print FILENAME,g,f,"\n"h; next} 0&&/^[^s/
> {]/{p=0; h=""; next} 0&{h=h"\n"$0}' "$f"; done |less
>
> diff --git a/src/backend/optimizer/util/pathnode.c
> b/src/backend/optimizer/util/pathnode.c
> index cedb3848dd..e53d381e19 100644
> --- a/src/backend/optimizer/util/pathnode.c
> +++ b/src/backend/optimizer/util/pathnode.c
> @@ -105,7 +105,7 @@ compare_path_costs(Path *path1, Path *path2,
> CostSelector criterion)
>  }
>
>  /*
> - * compare_path_fractional_costs
> + * compare_fractional_path_costs
>   *   Return -1, 0, or +1 according as path1 is cheaper, the same cost,
>   *   or more expensive than path2 for fetching the specified fraction
>   *   of the total tuples.
> diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
> index a30a2c2eb8..72e6a7ea61 100644
> --- a/src/common/pg_lzcompress.c
> +++ b/src/common/pg_lzcompress.c
> @@ -825,7 +825,7 @@ pglz_decompress(const char *source, int32 slen, char
> *dest,
>
>
>  /* --
> - * pglz_max_compressed_size -
> + * pglz_maximum_compressed_size -
>   *
>   * Calculate the maximum compressed size for a given amount
> of raw data.
>   * Return the maximum size, or total compressed size if
> maximum size is
>

Hi,
For the first list, do yu want to include the path to the file for
IDENTIFICATION ?
If so, I can prepare a patch covering the files in that list.

Cheers

Re: incorrect file name in backend_progress.c header

2021-09-10 Thread Justin Pryzby

On Fri, Sep 10, 2021 at 10:11:34AM -0700, Zhihong Yu wrote:
> Hi,
> I was looking at backend_progress.c and noticed that the filename and path
> were wrong in the header.
> 
> Here is patch which corrects the mistake.

For the record, I don't really like boilerplate, but fixing the boilerplates
all at once is at least better than fixing them one at a time.

Would you want to prepare a patch handling all of these ?

$ find src/ -name '*.c' -type f -print0 |xargs -r0 awk '{fn=gensub(".*/", "", 
"1", FILENAME)} FILENAME~/scripts/{fn=gensub("\\.c", "", 1, fn)} FNR==1 && 
/---/{top=1} /\*\//{top=0} !top{next} FNR==3 && NF==2 && $2!=fn{print 
FILENAME,"head",fn,$2} /IDENTIFICATION/{getline; if ($0!~FILENAME){print 
FILENAME,"foot",$2}}'
src/backend/catalog/pg_publication.c foot pg_publication.c
src/backend/utils/activity/wait_event.c foot src/backend/postmaster/wait_event.c
src/backend/utils/activity/backend_status.c foot 
src/backend/postmaster/backend_status.c
src/backend/utils/adt/version.c foot 
src/backend/replication/logical/reorderbuffer.c foot 
src/backend/replication/reorderbuffer.c
src/backend/replication/logical/snapbuild.c foot 
src/backend/replication/snapbuild.c
src/backend/replication/logical/logicalfuncs.c foot 
src/backend/replication/logicalfuncs.c
src/backend/optimizer/util/inherit.c foot src/backend/optimizer/path/inherit.c
src/backend/optimizer/util/appendinfo.c foot 
src/backend/optimizer/path/appendinfo.c
src/backend/commands/publicationcmds.c foot publicationcmds.c
src/backend/commands/subscriptioncmds.c foot subscriptioncmds.c
src/interfaces/libpq/fe-misc.c head fe-misc.c FILE
src/bin/scripts/common.c head common common.c
src/port/pgcheckdir.c head pgcheckdir.c src/port/pgcheckdir.c

There's some function comments wrong too.
In case someone want to fix them together.

commit 4fba6c5044da43c1fa263125e422e869ae449ae7
Author: Justin Pryzby 
Date:   Sun Sep 5 18:14:39 2021 -0500

Wrong function name in header comment

Found like this

for f in `find src -type f -name '*.c'`; do awk -v p=0 '/^\/\* *-*$/{h=$0; 
getline; h=h"\n"$0; g=gensub( "[^_[:alnum:]].*", "", 1, $2); p=1} 0&&/^{/{p=0; 
print h}; /^ \*\/$/{h=h"\n"$0; getline a; h=h"\n"a; getline f; h=h"\n"f; 
l=length(g); if (substr(f,1,7) == substr(g,1,7) && substr(f,1,l) != 
substr(g,1,l)) print FILENAME,g,f,"\n"h; next} 0&&/^[^s/ {]/{p=0; h=""; next} 
0&{h=h"\n"$0}' "$f"; done |less

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index cedb3848dd..e53d381e19 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -105,7 +105,7 @@ compare_path_costs(Path *path1, Path *path2, CostSelector 
criterion)
 }
 
 /*
- * compare_path_fractional_costs
+ * compare_fractional_path_costs
  *   Return -1, 0, or +1 according as path1 is cheaper, the same cost,
  *   or more expensive than path2 for fetching the specified fraction
  *   of the total tuples.
diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c
index a30a2c2eb8..72e6a7ea61 100644
--- a/src/common/pg_lzcompress.c
+++ b/src/common/pg_lzcompress.c
@@ -825,7 +825,7 @@ pglz_decompress(const char *source, int32 slen, char *dest,
 
 
 /* --
- * pglz_max_compressed_size -
+ * pglz_maximum_compressed_size -
  *
  * Calculate the maximum compressed size for a given amount of raw 
data.
  * Return the maximum size, or total compressed size if maximum 
size is

Re: [Patch] ALTER SYSTEM READ ONLY

On Fri, Sep 10, 2021 at 1:16 PM Mark Dilger
 wrote:
> uses "immediately" and "will kill the running transaction" which reenforced 
> the impression that this mechanism is heavier handed than it is.

It's intended to be just as immediate as e.g. pg_cancel_backend() and
pg_terminate_backend(), which work just the same way, but not any more
so. I guess we could look at how things are worded in those cases.
>From a user perspective such things are usually pretty immediate, but
not as immediate as firing a signal handler. Computers are fast.[1]

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] https://www.youtube.com/watch?v=6xijhqU8r2A

Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan  wrote:
> That being said, I think the discussion about batching is a good one
> to have.  If the overhead described in your SCP example is
> representative of a typical archive_command, then parallelism does
> seem a bit silly.

I think that's pretty realistic, because a lot of people's archive
commands are going to actually be, or need to use, scp specifically.
However, there are also cases where people are using commands that
just put the file in some local directory (maybe on a remote mount
point) and I would expect the startup overhead to be much less in
those cases. Maybe people are archiving via HTTPS or similar as well,
and then you again have some connection overhead though, I suspect,
not as much as scp, since web pages do not take 3 seconds to get an
https connection going. I don't know why scp is so crazy slow.

Even in the relatively low-overhead cases, though, I think we would
want to do some real testing to see if the benefits are as we expect.
See http://postgr.es/m/20200420211018.w2qphw4yybcbx...@alap3.anarazel.de
and downthread for context. I was *convinced* that parallel backup was
a win. Benchmarking was a tad underwhelming, but there was a clear if
modest benefit by running a synthetic test of copying a lot of files
serially or in parallel, with the files spread across multiple
filesystems on the same physical box. However, when Andres modified my
test program to use posix_fadvise(), posix_fallocate(), and
sync_file_range() while doing the copies, the benefits of parallelism
largely evaporated, and in fact in some cases enabling parallelism
caused major regressions. In other words, the apparent benefits of
parallelism were really due to suboptimal behaviors in the Linux page
cache and some NUMA effects that were in fact avoidable.

So I'm suspicious that the same things might end up being true here.
It's not exactly the same, because the goal of WAL archiving is to
keep up with the rate of WAL generation, and the goal of a backup is
(unless max-rate is used) to finish as fast as possible, and that
difference in goals might end up being significant. Also, you can make
an argument that some people will benefit from a parallelism feature
even if a perfectly-implemented archive_command doesn't, because many
people use really terrible archive_commnads. But all that said, I
think the parallel backup discussion is still a cautionary tale to
which some attention ought to be paid.

> We'd essentially be using a ton more resources when
> there's obvious room for improvement via reducing amount of overhead
> per archive.  I think we could easily make the batch size configurable
> so that existing archive commands would work (e.g.,
> archive_batch_size=1).  However, unlike the simple parallel approach,
> you'd likely have to adjust your archive_command if you wanted to make
> use of batching.  That doesn't seem terrible to me, though.  As
> discussed above, there are some implementation details to work out for
> archive failures, but nothing about that seems intractable to me.
> Plus, if you still wanted to parallelize things, feeding your
> archive_command several files at a time could still be helpful.

Yep.

> I'm currently leaning toward exploring the batching approach first.  I
> suppose we could always make a prototype of both solutions for
> comparison with some "typical" archive commands if that would help
> with the discussion.

Yeah, I think the concerns here are more pragmatic than philosophical,
at least for me.

I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.

However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Numeric x^y for negative x

On Thu, Sep 02, 2021 at 07:27:09AM +0100, Dean Rasheed wrote:
> On Thu, 2 Sept 2021 at 00:39, Jaime Casanova
>  wrote:
> >
> > Hi Dean,
> >
> > It seems you already committed this. But it's still as "Ready for
> > committer" in the commitfest app.
> >
> > Are we waiting for something else or we can mark it as committed?
> >
> 
> It's mostly done, but there is one more corner case where it loses
> precision. I'll post an update shortly.
> 

Great! I'm marking this as "waiting on author".

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: parallelizing the archiver

2021-09-10 Thread Bossart, Nathan

On 9/10/21, 10:12 AM, "Robert Haas"  wrote:
> If on the other hand you imagine a system that's not very busy, say 1
> WAL file being archived every 10 seconds, then using a batch size of
> 30 would very significantly delay removal of old files. However, on
> this system, batching probably isn't really needed. The rate of WAL
> file generation is low enough that if you pay the startup cost of your
> archive_command for every file, you're probably still doing just fine.
>
> Probably, any kind of parallelism or batching needs to take this kind
> of time-based thinking into account. For batching, the rate at which
> files are generated should affect the batch size. For parallelism, it
> should affect the number of processes used.

I was thinking that archive_batch_size would be the maximum batch
size.  If the archiver only finds a single file to archive, that's all
it'd send to the archive command.  If it finds more, it'd send up to
archive_batch_size to the command.

Nathan

Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-10 Thread Mark Dilger

> On Sep 10, 2021, at 9:56 AM, Robert Haas  wrote:
> 
> I think the relevant question here is not "could a signal handler
> fire?" but "can we hit a CHECK_FOR_INTERRUPTS()?". If the relevant
> question is the former, then there's no hope of ever making it work
> because there's always a race condition. But the signal handler is
> only setting flags whose only effect is to make a subsequent
> CHECK_FOR_INTERRUPTS() do something, so it doesn't really matter when
> the signal handler can run, but when CHECK_FOR_INTERRUPTS() can call
> ProcessInterrupts().

Ok, that makes more sense.  I was reviewing the code after first reviewing the 
documentation changes, which lead me to believe the system was designed to 
respond more quickly than that:

+WAL prohibited is a read-only system state. Any permitted user can call
+pg_prohibit_wal function to forces the system into
+a WAL prohibited mode where insert write ahead log will be prohibited until
+the same function executed to change that state to read-write. Like Hot

and

+Otherwise, it will be off.  When the user requests WAL
+prohibited state, at that moment if any existing session is already running
+a transaction, and that transaction has already been performed or planning
+to perform wal write operations then the session running that transaction
+will be terminated.

"forces the system" in the first part, and "at that moment ... that transaction 
will be terminated" sounds heavier handed than something which merely sets a 
flag asking the backend to exit.  I was reading that as more immediate and then 
trying to figure out how the signal handling could possibly work, and failing 
to see how.

The README:

+Any
+backends which receive WAL prohibited system state transition barrier interrupt
+need to stop WAL writing immediately.  For barrier absorption the backed(s) 
will
+kill the running transaction which has valid XID indicates that the transaction
+has performed and/or planning WAL write.

uses "immediately" and "will kill the running transaction" which reenforced the 
impression that this mechanism is heavier handed than it is.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

incorrect file name in backend_progress.c header

Hi,
I was looking at backend_progress.c and noticed that the filename and path
were wrong in the header.

Here is patch which corrects the mistake.

Please take a look.

Thanks


backend_prog-hdr.patch
Description: Binary data

Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud  wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance.  But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed.  It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like.  In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.  I may be
> overthinking here and definitely having feedback from people with more
> experience around that would be welcome.

That's a fair point. I'm not sure how much it matters, though. I think
you want to imagine a system where there are let's say 10 WAL flies
being archived per second.  Using fork() + exec() to spawn a shell
command 10 times per second is a bit expensive, whether you do it
serially or in parallel, and even if the command is something with a
less-insane startup overhead than scp. If we start a shell command say
every 3 seconds and give it 30 files each time, we can reduce the
startup costs we're paying by ~97% at the price of having to wait up
to 3 additional seconds to know that archiving succeeded for any
particular file. That sounds like a pretty good trade-off, because the
main benefit of removing old files is that it keeps us from running
out of disk space, and you should not be running a busy system in such
a way that it is ever within 3 seconds of running out of disk space,
so whatever.

If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.

Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

2021-09-10 Thread Jacob Champion

On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance.  But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed.  It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like.  In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.

Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)

For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?

> > That is possibly true. I think it might work to just assume that you
> > have to retry everything if it exits non-zero, but that requires the
> > archive command to be smart enough to do something sensible if an
> > identical file is already present in the archive.
> 
> Yes, it could be.  I think that we need more feedback for that too.

Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?

--Jacob

Re: parallelizing the archiver

2021-09-10 Thread Bossart, Nathan

On 9/10/21, 8:22 AM, "Robert Haas"  wrote:
> On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud  wrote:
>> Those approaches don't really seems mutually exclusive?  In both case
>> you will need to internally track the status of each WAL file and
>> handle non contiguous file sequences.  In case of parallel commands
>> you only need additional knowledge that some commands is already
>> working on a file.  Wouldn't it be even better to eventually be able
>> launch multiple batches of multiple files rather than a single batch?
>
> Well, I guess I'm not convinced. Perhaps people with more knowledge of
> this than I may already know why it's beneficial, but in my experience
> commands like 'cp' and 'scp' are usually limited by the speed of I/O,
> not the fact that you only have one of them running at once. Running
> several at once, again in my experience, is typically not much faster.
> On the other hand, scp has a LOT of startup overhead, so it's easy to
> see the benefits of batching.
>
> [...]
>
>> If we start with parallelism first, the whole ecosystem could
>> immediately benefit from it as is.  To be able to handle multiple
>> files in a single command, we would need some way to let the server
>> know which files were successfully archived and which files weren't,
>> so it requires a different communication approach than the command
>> return code.
>
> That is possibly true. I think it might work to just assume that you
> have to retry everything if it exits non-zero, but that requires the
> archive command to be smart enough to do something sensible if an
> identical file is already present in the archive.

My initial thinking was similar to Julien's.  Assuming I have an
archive_command that handles one file, I can just set
archive_max_workers to 3 and reap the benefits.  If I'm using an
existing utility that implements its own parallelism, I can keep
archive_max_workers at 1 and continue using it.  This would be a
simple incremental improvement.

That being said, I think the discussion about batching is a good one
to have.  If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly.  We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive.  I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1).  However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching.  That doesn't seem terrible to me, though.  As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.

I'm currently leaning toward exploring the batching approach first.  I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.

Nathan

Re: [Patch] ALTER SYSTEM READ ONLY

On Fri, Sep 10, 2021 at 12:20 PM Mark Dilger
 wrote:
> A better example may be found in ginmetapage.c:
>
> needwal = RelationNeedsWAL(indexrel);
> if (needwal)
> {
> CheckWALPermitted();
> computeLeafRecompressWALData(leaf);
> }
>
> /* Apply changes to page */
> START_CRIT_SECTION();

Yeah, that looks sketchy. Why not move CheckWALPermitted() down a line?

> Even if CheckWALPermitted is assumed to be close enough to atomic to not be a 
> problem (I don't agree), that argument can't be made here, as 
> computeLeafRecompressWALData is not trivial and signals could easily be 
> processed while it is running.

I think the relevant question here is not "could a signal handler
fire?" but "can we hit a CHECK_FOR_INTERRUPTS()?". If the relevant
question is the former, then there's no hope of ever making it work
because there's always a race condition. But the signal handler is
only setting flags whose only effect is to make a subsequent
CHECK_FOR_INTERRUPTS() do something, so it doesn't really matter when
the signal handler can run, but when CHECK_FOR_INTERRUPTS() can call
ProcessInterrupts().

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Toast compression method options

On Fri, Sep 10, 2021 at 10:54:04AM +0530, Dilip Kumar wrote:
> On Fri, 10 Sep 2021 at 10:40 AM, Jaime Casanova <
> jcasa...@systemguards.com.ec> wrote:
> 
> > On Mon, Jul 19, 2021 at 01:24:03PM +0530, Dilip Kumar wrote:
> > > On Sun, Jul 18, 2021 at 9:15 PM Dilip Kumar 
> > wrote:
> > > >
> > > > On Wed, Jul 14, 2021 at 5:35 PM vignesh C  wrote:
> > > > >
> > > > > On Thu, May 6, 2021 at 7:24 PM Dilip Kumar 
> > wrote
> > > > >
> > > > > The patch does not apply on Head anymore, could you rebase and post a
> > > > > patch. I'm changing the status to "Waiting for Author".
> > > >
> > > > Okay, I will rebase and send it by next week.
> > >
> > > I have rebased the patch.
> > >
> >
> > Hi,
> >
> > This doesn't apply cleanly nor compile.
> > Are you planning to send a rebased version?
> 
> 
> I will do that early next week.
> 

Great! I'm marking the patch as "waiting on author". 
Thanks for keep working on this.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-10 Thread Mark Dilger




> On Sep 10, 2021, at 8:42 AM, Mark Dilger  wrote:
> 
> Take for example a code stanza from heapam.c:
> 
>if (needwal)
>CheckWALPermitted();
> 
>/* NO EREPORT(ERROR) from here till changes are logged */
>START_CRIT_SECTION();
> 
> Now, I know that interrupts won't be processed after starting the critical 
> section, but I can see plain as day that an interrupt might get processed 
> *during* CheckWALPermitted, since that function isn't atomic. 

A better example may be found in ginmetapage.c:

needwal = RelationNeedsWAL(indexrel);
if (needwal)
{
CheckWALPermitted();
computeLeafRecompressWALData(leaf);
}

/* Apply changes to page */
START_CRIT_SECTION();

Even if CheckWALPermitted is assumed to be close enough to atomic to not be a 
problem (I don't agree), that argument can't be made here, as 
computeLeafRecompressWALData is not trivial and signals could easily be 
processed while it is running.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: parallelizing the archiver




> 10 сент. 2021 г., в 19:19, Julien Rouhaud  написал(а):
> Wouldn't it be better to
> have a new archive_mode, e.g. "daemon", and have postgres responsible
> to (re)start it, and pass information through the daemon's
> stdin/stdout or something like that?

We don't even need to introduce new archive_mode.
Currently archive_command has no expectations regarding stdin\stdout.
Let's just say that we will push new WAL names to stdin until archive_command 
exits.
And if archive_command prints something to stdout we will interpret it as 
archived WAL names.
That's it.

Existing archive_commands will continue as is.

Currently information about what is archived is stored on filesystem in 
archive_status dir. We do not need to change anything.
If archive_command exits (with any exit code) we will restart it if there are 
WAL files that still were not archived.

Best regards, Andrey Borodin.

Re: parallelizing the archiver

2021-09-10 Thread Julien Rouhaud

On Fri, Sep 10, 2021 at 11:22 PM Robert Haas  wrote:
>
> Well, I guess I'm not convinced. Perhaps people with more knowledge of
> this than I may already know why it's beneficial, but in my experience
> commands like 'cp' and 'scp' are usually limited by the speed of I/O,
> not the fact that you only have one of them running at once. Running
> several at once, again in my experience, is typically not much faster.
> On the other hand, scp has a LOT of startup overhead, so it's easy to
> see the benefits of batching.

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance.  But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed.  It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like.  In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel.  I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.

> That is possibly true. I think it might work to just assume that you
> have to retry everything if it exits non-zero, but that requires the
> archive command to be smart enough to do something sensible if an
> identical file is already present in the archive.

Yes, it could be.  I think that we need more feedback for that too.

> Sure. Actually, I think a background worker would be better than a
> separate daemon. Then it could just talk to shared memory directly.

I thought about it too, but I was under the impression that most
people would want to implement a custom daemon (or already have) with
some more parallel/thread friendly language.

Re: a misbehavior of partition row movement (?)

On Fri, Sep 10, 2021 at 7:06 AM Amit Langote 
wrote:

> On Fri, Sep 3, 2021 at 12:23 PM Amit Langote 
> wrote:
> > Hi Andrew,
> >
> > On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan 
> wrote:
> > > On 7/13/21 8:09 AM, Amit Langote wrote:
> > > > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > > > very fundamental issues I found in the patch during the last cycle.
> > > > I’m fine with either marking this as RwF for now or move to the next
> CF.
> > >
> > > Amit, do you have time now to work on this?
> >
> > I will take some time next week to take a fresh look at this and post an
> update.
>
> So I started looking at this today.  I didn't make much an inroad into
> the stumbling block with 0002 patch that I had mentioned back in [1],
> though I decided to at least post a rebased version of the patches
> that apply.
>
> I think 0001 is independently committable on its own merits,
> irrespective of the yet unresolved problems of 0002, a patch to fix
> $subject, which I'll continue to work on.
>
> 0003 shows a test that crashes the server due to said problem.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>
> [1]
> https://www.postgresql.org/message-id/CA%2BHiwqHMpNZOc2Z-zgdO9hbJ7wMCOC%3DWpJYszVusZ%3DoE2OTf8w%40mail.gmail.com

Hi,
For patch 0001, GetForeignKeyActionTriggers:

+   if (!OidIsValid(*deleteTriggerOid) || !OidIsValid(*updateTriggerOid))
+   elog(ERROR, "could not find action triggers of foreign key
constraint %u",

I think if the error message includes whether it is the delete or update
trigger that isn't found, it would be helpful.

Similar comment for error message in GetForeignKeyCheckTriggers().

Cheers

Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-10 Thread Mark Dilger

> On Sep 10, 2021, at 7:36 AM, Amul Sul  wrote:
> 
>> v33-0005
>> 
>> This patch makes bool XLogInsertAllowed() more complicated than before.  The 
>> result used to depend mostly on the value of LocalXLogInsertAllowed except 
>> that when that value was negative, the result was determined by 
>> RecoveryInProgress(). There was an arcane rule that LocalXLogInsertAllowed 
>> must have the non-negative values binary coercible to boolean "true" and 
>> "false", with the basis for that rule being the coding of 
>> XLogInsertAllowed().  Now that the function is more complicated, this rule 
>> seems even more arcane.  Can we change the logic to not depend on casting an 
>> integer to bool?
>> 
> 
> We can't use a boolean variable because LocalXLogInsertAllowed
> represents three states as, 1 means "wal is allowed'', 0 for "wal is
> disallowed", and -1 is for "need to check".

I'm complaining that we're using an integer rather than an enum.  I'm ok if we 
define it so that WAL_ALLOWABLE_UNKNOWN = -1, WAL_DISALLOWED = 0, WAL_ALLOWED = 
1 or such, but the logic of the function has gotten complicated enough that 
having to remember which number represents which logical condition has become a 
(small) mental burden.  Given how hard the WAL code is to read and fully grok, 
I'd rather avoid any unnecessary burden, even small ones.

>> The new function CheckWALPermitted() seems to test the current state of 
>> variables but not lock any of them, and the new function comment says:
>> 
> 
> CheckWALPermitted() calls XLogInsertAllowed() does check the
> LocalXLogInsertAllowed flag which is local to that process only, and
> nobody else reads that concurrently.
> 
>> /*
>> * In opposite to the above assertion if a transaction doesn't have valid XID
>> * (e.g. VACUUM) then it won't be killed while changing the system state to 
>> WAL
>> * prohibited.  Therefore, we need to explicitly error out before entering 
>> into
>> * the critical section.
>> */
>> 
>> This suggests to me that a vacuum process can check whether wal is 
>> prohibited, then begin a critical section which needs wal to be allowed, and 
>> concurrently somebody else might disable wal without killing the vacuum 
>> process.  I'm given to wonder what horrors await when the vacuum process 
>> does something that needs to be wal logged but cannot be.  Does it trigger a 
>> panic?  I don't like the idea that calling pg_prohibit_wal durning a vacuum 
>> might panic the cluster.  If there is some reason this is not a problem, I 
>> think the comment should explain it.  In particular, why is it sufficient to 
>> check whether wal is prohibited before entering the critical section and not 
>> necessary to be sure it remains allowed through the lifetime of that 
>> critical section?
>> 
> 
> Hm, interrupts absorption are disabled inside the critical section.
> The wal prohibited state for that process (here vacuum) will never get
> set until it sees the interrupts & the system will not be said wal
> prohibited until every process sees that interrupts. I am not sure we
> should explain the characteristics of the critical section at this
> place, if want, we can add a brief saying that inside the critical
> section we should not worry about the state change which never happens
> because interrupts are disabled there.

I think the fact that interrupts are disabled during critical sections is 
understood, so there is no need to mention that.  The problem is that the 
method for taking the system read-only is less generally known, and readers of 
other sections of code need to jump to the definition of CheckWALPermitted to 
read the comments and understand what it does.  Take for example a code stanza 
from heapam.c:

if (needwal)
CheckWALPermitted();

/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();

Now, I know that interrupts won't be processed after starting the critical 
section, but I can see plain as day that an interrupt might get processed 
*during* CheckWALPermitted, since that function isn't atomic.  It might happen 
after the check is meaningfully finished but before the function actually 
returns.  So I'm not inclined to believe that the way this all works is 
dependent on interrupts being blocked.  So I think, maybe this is all protected 
by some other scheme.  But what?  It's not clear from the code comments for 
CheckWALPermitted, so I'm left having to reverse engineer the system to 
understand it.

One interpretation is that the signal handler will exit() my backend if it 
receives a signal saying that the system is going read-only, so there is no 
race condition.  But then why the call to CheckWALPermitted()?  If this 
interpretation were correct, we'd happily enter the critical section without 
checking, secure in the knowledge that as long as we haven't exited yet, all is 
ok.

Another interpretation is that the whole thing is just a performance trick.  
Maybe we're ok with the idea that we will

Re: parallelizing the archiver

On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud  wrote:
> Those approaches don't really seems mutually exclusive?  In both case
> you will need to internally track the status of each WAL file and
> handle non contiguous file sequences.  In case of parallel commands
> you only need additional knowledge that some commands is already
> working on a file.  Wouldn't it be even better to eventually be able
> launch multiple batches of multiple files rather than a single batch?

Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.

[rhaas pgsql]$ touch x y z
[rhaas pgsql]$ time sh -c 'scp x cthulhu: && scp y cthulhu: && scp z cthulhu:'
x 100%  207KB  78.8KB/s   00:02
y 100%0 0.0KB/s   00:00
z 100%0 0.0KB/s   00:00

real 0m9.418s
user 0m0.045s
sys 0m0.071s
[rhaas pgsql]$ time sh -c 'scp x y z cthulhu:'
x 100%  207KB 273.1KB/s   00:00
y 100%0 0.0KB/s   00:00
z 100%0 0.0KB/s   00:00

real 0m3.216s
user 0m0.017s
sys 0m0.020s

> If we start with parallelism first, the whole ecosystem could
> immediately benefit from it as is.  To be able to handle multiple
> files in a single command, we would need some way to let the server
> know which files were successfully archived and which files weren't,
> so it requires a different communication approach than the command
> return code.

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

> But as I said, I'm not convinced that using the archive_command
> approach for that is the best approach  If I understand correctly,
> most of the backup solutions would prefer to have a daemon being
> launched and use it at a queuing system.  Wouldn't it be better to
> have a new archive_mode, e.g. "daemon", and have postgres responsible
> to (re)start it, and pass information through the daemon's
> stdin/stdout or something like that?

Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Increase value of OUTER_VAR

2021-09-10 Thread Aleksander Alekseev

Hi hackers,

> > So I'm inclined to propose pushing this and seeing what happens.
>
> +1

+1. The proposed changes will be beneficial in the long term. They
will affect existing extensions. However, the scale of the problem
seems to be exaggerated.

I can confirm that the patch passes installcheck-world. After some
searching through the code, I was unable to identify any places where
the logic will break. Although this only proves my inattention, the
easiest way to make any further progress seems to apply the patch.

-- 
Best regards,
Aleksander Alekseev

Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-10 Thread Amul Sul

On Thu, Sep 9, 2021 at 11:12 PM Mark Dilger
 wrote:
>
>

Thank you, for looking at the patch.  Please see my reply inline below:

>
> > On Sep 8, 2021, at 6:44 AM, Amul Sul  wrote:
> >
> > Here is the rebased version.
>
> v33-0004
>
> This patch moves the include of "catalog/pg_control.h" from transam/xlog.c 
> into access/xlog.h, making pg_control.h indirectly included from a much 
> larger set of files.  Maybe that's ok.  I don't know.  But it seems you are 
> doing this merely to get the symbol (not even the definition) for struct 
> DBState.  I'd recommend rearranging the code so this isn't necessary, but 
> otherwise you'd at least want to remove the now redundant includes of 
> catalog/pg_control.h from xlogdesc.c, xloginsert.c, auth-scram.c, 
> postmaster.c, misc/pg_controldata.c, and pg_controldata/pg_controldata.c.
>

Yes, you are correct, xlog.h is included in more than 150 files. I was
wondering if we can have a forward declaration instead of including
pg_control.h (e.g. The same way struct XLogRecData was declared in
xlog.h). Perhaps, DBState is enum & I don't see we have done the same
for enum elsewhere as we are doing for structures, but that seems to
be fine, IMO.

Earlier, I was unsure before preparing this patch, but since that
makes sense (I assumed) and minimizes duplications, can we go ahead
and post separately with the same change in StartupXLOG() which I have
skipped for the same reason mentioned in patch commit-msg.

> v33-0005
>
> This patch makes bool XLogInsertAllowed() more complicated than before.  The 
> result used to depend mostly on the value of LocalXLogInsertAllowed except 
> that when that value was negative, the result was determined by 
> RecoveryInProgress().  There was an arcane rule that LocalXLogInsertAllowed 
> must have the non-negative values binary coercible to boolean "true" and 
> "false", with the basis for that rule being the coding of 
> XLogInsertAllowed().  Now that the function is more complicated, this rule 
> seems even more arcane.  Can we change the logic to not depend on casting an 
> integer to bool?
>

We can't use a boolean variable because LocalXLogInsertAllowed
represents three states as, 1 means "wal is allowed'', 0 for "wal is
disallowed", and -1 is for "need to check".

> The code comment change in autovacuum.c introduces a non-grammatical 
> sentence: "First, the system is not read only i.e. wal writes permitted".
>
> The function comment in checkpointer.c reads more like it toggles the system 
> into allowing something, rather than actually doing that same something: 
> "SendSignalToCheckpointer allows a process to send a signal to the checkpoint 
> process".
>
> The new code comment in ipci.c contains a typo, but more importantly, it 
> doesn't impart any knowledge beyond what a reader of the function name could 
> already surmise.  Perhaps the comment can better clarify what is happening: 
> "Set up wal probibit shared state"
>
> The new code comment in sync.c copies and changes a nearby comment but drops 
> part of the verb phrase:  "As in ProcessSyncRequests, we don't want to stop 
> wal prohibit change requests".  The nearby comment reads "stop absorbing".  I 
> think this one should read "stop processing".  This same comment is used 
> again below.   Then a third comment reads "For the same reason mentioned 
> previously for the wal prohibit state change request check."  That third 
> comment is too glib.
>
> tcop/utility.c needlessly includes "access/walprohibit.h"
>
> wait_event.h extends enum WaitEventIO with new values 
> WAIT_EVENT_WALPROHIBIT_STATE and WAIT_EVENT_WALPROHIBIT_STATE_CHANGE.  I 
> don't find the difference between these two names at all clear.  Waiting for 
> a state change is clear enough.  But how is waiting on a state different?
>
> xlog.h defines a new enum.  I don't find any of it clear; not the comment, 
> nor the name of the enum, nor the names of the values:
>
> /* State of work that enables wal writes */
> typedef enum XLogAcceptWritesState
> {
> XLOG_ACCEPT_WRITES_PENDING = 0, /* initial state, not started */
> XLOG_ACCEPT_WRITES_SKIPPED, /* skipped wal writes */
> XLOG_ACCEPT_WRITES_DONE /* wal writes are enabled */
> } XLogAcceptWritesState;
>
> This enum seems to have been written from the point of view of someone who 
> already knew what it was for.  It needs to be written in a way that will be 
> clear to people who have no idea what it is for.
>
> v33-0006:
>
> The new code comments in brin.c and elsewhere should use the verb "require" 
> rather than "have", otherwise "building indexes" reads as a noun phrase 
> rather than as a gerund: /* Building indexes will have an XID */
>

Will try to think about the pointed code comments for the improvements.

> The new function CheckWALPermitted() seems to test the current state of 
> variables but not lock any of them, and the new function comment says:
>

CheckWALPermitted() calls XLogInsertAllowed() does check the

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2021-09-10 Thread Bharath Rupireddy

On Fri, Sep 10, 2021 at 7:21 AM Michael Paquier  wrote:
>
> On Thu, Sep 09, 2021 at 10:49:46PM +, Bossart, Nathan wrote:
> > +1
>
> A backend approach has the advantage that you can use the proper locks
> to make sure that a segment is not recycled or removed by a concurrent
> checkpoint, so that would be reliable.

Thanks for sharing your thoughts. IMO, using locks for showing WAL
stats isn't a good way, because these new functions may block the
checkpointer from removing/recycling the WAL files.  We don't want to
do that. If at all, user has asked stats of  an LSN/range of LSNs if
it is/they are available in the pg_wal directory, we provide the info
otherwise we can throw warnings/errors. This behaviour is pretty much
in sycn with what pg_waldump does right now.

And, some users may not need these new functions at all, so in such
cases going with an extension way makes it more usable.

Regards,
Bharath Rupireddy.

Re: parallelizing the archiver

2021-09-10 Thread Julien Rouhaud

On Fri, Sep 10, 2021 at 9:13 PM Robert Haas  wrote:
>
> To me, it seems way more beneficial to think about being able to
> invoke archive_command with many files at a time instead of just one.
> I think for most plausible archive commands that would be way more
> efficient than what you propose here. It's *possible* that if we had
> that, we'd still want this, but I'm not even convinced.

Those approaches don't really seems mutually exclusive?  In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences.  In case of parallel commands
you only need additional knowledge that some commands is already
working on a file.  Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is.  To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach  If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system.  Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails

2021-09-10 Thread torikoshia


On 2021-09-09 19:03, Peter Eisentraut wrote:

On 07.09.21 20:31, Tom Lane wrote:

torikoshia  writes:
While working on [1], we found that EXPLAIN(VERBOSE) to CTE with 
SEARCH

BREADTH FIRST ends up ERROR.


Yeah.  It's failing here:

  * We're deparsing a Plan tree so we don't have a 
CTE
  * list.  But the only place we'd see a Var 
directly
  * referencing a CTE RTE is in a CteScan plan 
node, and we

  * can look into the subplan's tlist instead.

 if (!dpns->inner_plan)
 elog(ERROR, "failed to find plan for CTE %s",
  rte->eref->aliasname);

The problematic Var is *not* in a CteScan plan node; it's in a
WorkTableScan node.  It's not clear to me whether this is a bug
in the planner's handling of SEARCH BREADTH FIRST, or if the plan
is as-intended and ruleutils.c is failing to cope.


The search clause is resolved by the rewriter, so it's unlikely that
the planner is doing something wrong.  Either the rewriting produces
something incorrect (but then one might expect that the query results
would be wrong), or the structures constructed by rewriting are not
easily handled by ruleutils.c.

If we start from the example in the documentation
:

"""
WITH RECURSIVE search_tree(id, link, data, depth) AS (
SELECT t.id, t.link, t.data, 0
FROM tree t
  UNION ALL
SELECT t.id, t.link, t.data, depth + 1
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY depth;

To get a stable sort, add data columns as secondary sorting columns.
"""

In order to handle that part about the stable sort, the query
constructed internally is something like

WITH RECURSIVE search_tree(id, link, data, seq) AS (
SELECT t.id, t.link, t.data, ROW(0, id, link)
FROM tree t
  UNION ALL
SELECT t.id, t.link, t.data, ROW(seq.depth + 1, id, link)
FROM tree t, search_tree st
WHERE t.id = st.link
)
SELECT * FROM search_tree ORDER BY seq;

The bit "seq.depth" isn't really valid when typed in like that, I
think, but of course internally this is all wired together with
numbers rather than identifiers.  I suspect that that is what
ruleutils.c trips over.


Thanks for your advice, it seems right.

EXPLAIN VERBOSE can be output without error when I assigned testing 
purpose CoercionForm to 'seq.depth + 1'.


I've attached the patch for the changes made for this test for your 
reference, but I'm not sure it's appropriate for creating a new 
CoercionForm to fix the issue..


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONdiff --git a/src/backend/rewrite/rewriteSearchCycle.c b/src/backend/rewrite/rewriteSearchCycle.c
index c50ebdba24..659a5c440f 100644
--- a/src/backend/rewrite/rewriteSearchCycle.c
+++ b/src/backend/rewrite/rewriteSearchCycle.c
@@ -508,7 +508,7 @@ rewriteSearchAndCycle(CommonTableExpr *cte)
 			fs->resulttype = INT8OID;
 			fs->resulttypmod = -1;
 
-			fexpr = makeFuncExpr(F_INT8INC, INT8OID, list_make1(fs), InvalidOid, InvalidOid, COERCE_EXPLICIT_CALL);
+			fexpr = makeFuncExpr(F_INT8INC, INT8OID, list_make1(fs), InvalidOid, InvalidOid, COERCE_NOMINAL_CALL);
 
 			lfirst(list_head(search_col_rowexpr->args)) = fexpr;
 
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 8ff4e5dc07..c5a5fa06f3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -9648,6 +9648,12 @@ get_func_expr(FuncExpr *expr, deparse_context *context,
 			return;
 	}
 
+	if (expr->funcformat == COERCE_NOMINAL_CALL)
+	{
+		appendStringInfo(context->buf, "nominal func");
+		return;
+	}
+
 	/*
 	 * Normal function: display as proname(args).  First we need to extract
 	 * the argument datatypes.
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index c04282f91f..a3d600d1a2 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -483,7 +483,8 @@ typedef enum CoercionForm
 	COERCE_EXPLICIT_CALL,		/* display as a function call */
 	COERCE_EXPLICIT_CAST,		/* display as an explicit cast */
 	COERCE_IMPLICIT_CAST,		/* implicit cast, so hide it */
-	COERCE_SQL_SYNTAX			/* display with SQL-mandated special syntax */
+	COERCE_SQL_SYNTAX,			/* display with SQL-mandated special syntax */
+	COERCE_NOMINAL_CALL
 } CoercionForm;
 
 /*

Re: a misbehavior of partition row movement (?)

2021-09-10 Thread Amit Langote

On Fri, Sep 3, 2021 at 12:23 PM Amit Langote  wrote:
> Hi Andrew,
>
> On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan  wrote:
> > On 7/13/21 8:09 AM, Amit Langote wrote:
> > > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > > very fundamental issues I found in the patch during the last cycle.
> > > I’m fine with either marking this as RwF for now or move to the next CF.
> >
> > Amit, do you have time now to work on this?
>
> I will take some time next week to take a fresh look at this and post an 
> update.

So I started looking at this today.  I didn't make much an inroad into
the stumbling block with 0002 patch that I had mentioned back in [1],
though I decided to at least post a rebased version of the patches
that apply.

I think 0001 is independently committable on its own merits,
irrespective of the yet unresolved problems of 0002, a patch to fix
$subject, which I'll continue to work on.

0003 shows a test that crashes the server due to said problem.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqHMpNZOc2Z-zgdO9hbJ7wMCOC%3DWpJYszVusZ%3DoE2OTf8w%40mail.gmail.com

v8-0001-Create-foreign-key-triggers-in-partitioned-tables.patch
Description: Binary data

v8-0003-This-test-crashes-the-patch.patch
Description: Binary data

v8-0002-Enforce-foreign-key-correctly-during-cross-partit.patch
Description: Binary data

RE: Support tab completion for upper character inputs in psql

2021-09-10 Thread tanghy.f...@fujitsu.com

On Tuesday, September 7, 2021 5:25 PM, Peter Eisentraut 
 wrote:
>The coding of valid_input_text() seems a bit bulky.  I think you can do  
>the same thing using strspn() without a loop.

Thanks, modified in V9 patch.

>The name is also not great.  It's not like other strings are not "valid".

Modified.
valid_input_text() renamed to check_input_text()

>There is also no explanation why that specific set of characters is  
>allowed and not others.  Does it have something to do with identifier  
>syntax?  This needs to be explained.

Added some comments for pg_string_tolower_if_ascii().
For language like German/Turkish, it's not a good idea to lower the input text 
because the upper case words may not retain the same meaning.(Pointed at [1~3])

>Seeing that valid_input_text() is always called together with  
>pg_string_tolower(), I think those could be combined into one function,  
>like pg_string_tolower_if_ascii() is whatever.  That would save a lot of  
>repetition.

Modified.

>There are a couple of queries where the result is *not*  
>case-insensitive, namely
>
>Query_for_list_of_enum_values
>Query_for_list_of_available_extension_versions
>
>(and their variants).  These are cases where the query result is not  
>used as an identifier but as a (single-quoted) string.  So that needs to  
>be handled somehow, perhaps by adding a COMPLETE_WITH_QUERY_CS() similar  
>to COMPLETE_WITH_CS().

Hmm, I think 'a (single-quoted) string' identifier behaves the same way with or 
without my patch.
Could your please give me an example on that?(to help me figure out why we need 
something like COMPLETE_WITH_QUERY_CS())

>(A test case for the enum case should be doable easily.)

Test added.

BTW, I found tap completion for enum value is not perfect on HEAD.
Maybe I will fix this problem in another thread.

example:
=# create type pp_colors as enum ('green', 'blue', 'black');
=# ALTER TYPE pp_colors RENAME VALUE 'b[tab]
=# alter type pp_colors rename value 'b'   <- blue is not auto completed as 
expected

[1] https://www.postgresql.org/message-id/1282887.1619151455%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/20210423.13.2058612313278551429.horikyota.ntt%40gmail.com
[3] 
https://www.postgresql.org/message-id/a75a6574c0e3d4773ba20a73d493c2c9983c0657.camel%40cybertec.at

Regards,
Tang



v9-0001-Support-tab-completion-with-a-query-result-for-up.patch
Description:  v9-0001-Support-tab-completion-with-a-query-result-for-up.patch

Re: parallelizing the archiver

On Tue, Sep 7, 2021 at 6:36 PM Bossart, Nathan  wrote:
> Based on previous threads I've seen, I believe many in the community
> would like to replace archive_command entirely, but what I'm proposing
> here would build on the existing tools.  I'm currently thinking of
> something a bit like autovacuum_max_workers, but the archive workers
> would be created once and would follow a competing consumers model.

To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Skipping logical replication transactions on subscriber side

2021-09-10 Thread Greg Nancarrow

On Fri, Sep 10, 2021 at 12:33 AM Masahiko Sawada  wrote:
>
> Sorry for the late response. I've attached the updated patches that
> incorporate all comments unless I missed something. Please review
> them.
>

Here's some review comments for the v13-0001 patch:

doc/src/sgml/monitoring.sgml

(1)
There's an extra space in the following line, before "processing":

+   OID of the relation that the worker was  processing when the

(2) Suggested wording update:
BEFORE:
+field is always NULL if the error is reported by
AFTER:
+field is always NULL if the error is reported by the

(3) Suggested wording update:
BEFORE:
+by tablesync worker.
AFTER:
+by the tablesync worker.

(4)
Missing "." at end of following description (inconsistent with other doc):

+   Time at which these statistics were last reset

(5) Suggested wording update:
BEFORE:
+ can be granted EXECUTE to run the function.
AFTER:
+ can be granted EXECUTE privilege to run the function.


src/backend/postmaster/pgstat.c

(6) Suggested wording update:
BEFORE:
+ * for this relation already completes or the table is no
AFTER:
+ * for this relation already completed or the table is no


(7)
In the code below, since errmsg.m_nentries only ever gets incremented
by the 1st IF condition, it's probably best to include the 2nd IF
block within the 1st IF condition. Then can avoid checking
"errmsg.m_nentries" each loop iteration.

+ if (hash_search(not_ready_rels_htab, (void *) &(errent->relid),
+ HASH_FIND, NULL) == NULL)
+ errmsg.m_relids[errmsg.m_nentries++] = errent->relid;
+
+ /*
+ * If the message is full, send it out and reinitialize to
+ * empty
+ */
+ if (errmsg.m_nentries >= PGSTAT_NUM_SUBSCRIPTIONERRPURGE)
+ {
+ len = offsetof(PgStat_MsgSubscriptionErrPurge, m_relids[0])
+ + errmsg.m_nentries * sizeof(Oid);
+
+ pgstat_setheader(_hdr, PGSTAT_MTYPE_SUBSCRIPTIONERRPURGE);
+ pgstat_send(, len);
+ errmsg.m_nentries = 0;
+ }


(8)
+ * Tell the collector about reset the subscription error.

Is this meant to say "Tell the collector to reset the subscription error." ?


(9)
I think the following:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg);

should be:

+ len = offsetof(PgStat_MsgSubscriptionErr, m_errmsg[0]) + strlen(errmsg) + 1;

to account for the \0 terminator.

(10)
I don't think that using the following Assert is really correct here,
because PgStat_MsgSubscriptionErr is not setup to have the maximum
number of m_errmsg[] entries to fill up to PGSTAT_MAX_MSG_SIZE (as are
some of the other pgstat structs):

+ Assert(len < PGSTAT_MAX_MSG_SIZE);

(the max size of all of the pgstat structs is statically asserted anyway)

It would be correct to do the following instead:

+ Assert(strlen(errmsg) < PGSTAT_SUBSCRIPTIONERR_MSGLEN);

The overflow is guarded by the strlcpy() in any case.

(11)
Would be better to write:

+ rc = fwrite(, sizeof(nerrors), 1, fpout);

instead of:

+ rc = fwrite(, sizeof(int32), 1, fpout);


(12)
Would be better to write:

+ if (fread(, 1, sizeof(nerrors), fpin) != sizeof(nerrors))

instead of:

+ if (fread(, 1, sizeof(int32), fpin) != sizeof(int32))


src/include/pgstat.h

(13)
BEFORE:
+ * update/reset the error happening during logical
AFTER:
+ * update/reset the error occurring during logical

(14)
Typo:  replicatoin -> replication

+ * an error that occurred during application of logical replicatoin or


(15) Suggested wording update:
BEFORE:
+ * there is no table sync error, where is the common case in practice.
AFTER:
+ * there is no table sync error, which is the common case in practice.


Regards,
Greg Nancarrow
Fujitsu Australia

Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-09-10 Thread Etsuro Fujita

On Fri, Sep 10, 2021 at 1:00 AM Tom Lane  wrote:
> Etsuro Fujita  writes:
> > Having said that, I think another option for this would be to left the
> > code as-is; assume that 1) the foreign var has "COLLATE default”, not
> > an unknown collation, when labeled with "COLLATE default”, and 2)
> > "COLLATE default” on the local database matches "COLLATE default” on
> > the remote database.
>
> The fundamental complaint that started this thread was exactly that
> assumption (2) isn't safe.  So it sounds to me like you're proposing
> that we do nothing, which isn't a great answer either.  I suppose
> we could try documenting our way out of this, but people will
> continue to get bit because they won't read or won't understand
> the limitation.

Yeah, but I think it’s the user’s responsibility to make sure that the
local and remote default collations match if labeling collatable
columns with “COLLATE default” when defining foreign tables manually
IMO.

> I'd be happier if we had a way to check whether the local and remote
> default collations are compatible.  But it seems like that's a big ask,
> especially in cross-operating-system situations.

Agreed.

Best regards,
Etsuro Fujita

The End of the WAL

2021-09-10 Thread Christoph Berg

Re: Heikki Linnakangas
> On 09/09/2021 17:25, Tom Lane wrote:
> > Having done that, the check in md.c could be reduced to an Assert,
> > although there's something to be said for leaving it as-is as an
> > extra layer of defense.
> 
> Some operations call smgrextend() directly, like B-tree index creation. We
> don't want those operations to hit an assertion either.

Thanks, I can now see a proper error on 15devel with cassert enabled:

# insert into huge select generate_series(1,1000);
FEHLER:  54000: cannot extend relation base/13550/16384 beyond 4294967295 blocks
ORT:  ReadBuffer_common, bufmgr.c:831



Some months ago I had already tried what happens on running into
another limit, namely the end of WAL. Back then I was attributing the
result to "won't happen anyway", but since we are talking asserts,
there is the result:

/usr/lib/postgresql/15/bin/pg_resetwal -l 000100FF -D $PWD
 select pg_current_wal_lsn();
 pg_current_wal_lsn

 /FF000150
create table foo (id bigint);
repeat a few times: insert into foo select generate_series(1,20);

15devel, cassert:

TRAP: FailedAssertion("XLogRecPtrToBytePos(*EndPos) == endbytepos", File: 
"./build/../src/backend/access/transam/xlog.c", Line: 1324, PID: 1651661)
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x223022)[0x564ad115f022]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(heap_insert+0x3ae)[0x564ad10f0a2e]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x1bf8e9)[0x564ad10fb8e9]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35e30c)[0x564ad129a30c]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x35eedc)[0x564ad129aedc]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(standard_ExecutorRun+0x115)[0x564ad12695b5]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4da312)[0x564ad1416312]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4db0ee)[0x564ad14170ee]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(PortalRun+0x2ec)[0x564ad14176bc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x4d72b6)[0x564ad14132b6]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(PostgresMain+0x181c)[0x564ad1414edc]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(+0x43fd80)[0x564ad137bd80]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(PostmasterMain+0xca0)[0x564ad137cd10]
postgres: 15/regress: cbe postgres ::1(45564) INSERT(main+0x221)[0x564ad10973d1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7f63bcbe1d0a]
postgres: 15/regress: cbe postgres ::1(45564) 
INSERT(_start+0x2a)[0x564ad10979ca]
2021-09-10 11:44:40.997 CEST [1651617] LOG:  request to flush past end of 
generated WAL; request /E000, current position 0/50
2021-09-10 11:44:42.019 CEST [1651613] LOG:  Serverprozess (PID 1651661) wurde 
von Signal 6 beendet: Abgebrochen
2021-09-10 11:44:42.019 CEST [1651613] DETAIL:  Der fehlgeschlagene Prozess 
führte aus: insert into foo select generate_series(1,20);

The system properly (?) recovers, resuming at some /FFE614B8
position (i.e. discarding the part that was overflowing). However, if
I push it by moving closer to the end by doing smaller inserts, I can
get it into an infinite recovery loop:

2021-09-10 11:48:41.050 CEST [1652403] LOG:  Datenbanksystem wurde beim 
Herunterfahren unterbrochen; letzte bekannte Aktion am 2021-09-10 11:48:40 CEST
2021-09-10 11:48:41.051 CEST [1652403] LOG:  Datenbanksystem wurde nicht 
richtig heruntergefahren; automatische Wiederherstellung läuft
2021-09-10 11:48:41.051 CEST [1652403] LOG:  Redo beginnt bei /DF78
2021-09-10 11:48:41.051 CEST [1652403] LOG:  ungültige Datensatzlänge bei 
/DFB0: 24 erwartet, 0 erhalten
2021-09-10 11:48:41.051 CEST [1652403] LOG:  redo done at /DF78 
system usage: CPU: Benutzer: 0,00 s, System: 0,00 s, verstrichen: 0,00 s
TRAP: FailedAssertion("((XLogPageHeader) cachedPos)->xlp_magic == 
XLOG_PAGE_MAGIC", File: "./build/../src/backend/access/transam/xlog.c", Line: 
1982, PID: 1652404)
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(ExceptionalCondition+0x9a)[0x564ad15461ba]
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(+0x2221e8)[0x564ad115e1e8]
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(XLogInsertRecord+0x587)[0x564ad115ea27]
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(XLogInsert+0x653)[0x564ad116adf3]
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(CreateCheckPoint+0x64e)[0x564ad11608ee]
postgres: 15/regress: checkpointer performing end-of-recovery 
checkpoint(CheckpointerMain+0x3d4)[0x564ad136db34]
postgres: 15/regress: checkpointer performing end-of-recovery

Re: Schema variables - new implementation for Postgres 15

2021-09-10 Thread Pavel Stehule

pá 10. 9. 2021 v 10:32 odesílatel Erik Rijkers  napsal:

> On 9/10/21 10:06 AM, Pavel Stehule wrote:
> > Hi
> >
> > čt 9. 9. 2021 v 12:21 odesílatel Erik Rijkers  napsal:
> >
> >>
> >> Hi Pavel,
> >>
> >> The patch applies and compiles fine but 'make check' for the
> >> assert-enabled fails on 131 out of 210 tests.
> >>
> >> (while compiling HEAD checks run without errors for both assert-disabled
> >> and assert-enabled)
> >>
> >>
> >
> > Please, check, attached patch. I fixed a routine for processing a list of
> > identifiers - now it works with the identifier's node more sensitive.
> > Previous implementation of strVal was more tolerant.
>
>  > [schema-variables-20210910.patch]
>
> Apply, compile, make, & check(-world), and my small testsuite OK.
>
> So all's well again - Ready for committer!
>

Thank you for check and for report

Regards

Pavel


> Thanks,
>
> Erik Rijkers
>
>
> > Regards
> >
> > Pavel
> >
> >
> >>
> >
>

Re: Schema variables - new implementation for Postgres 15

2021-09-10 Thread Erik Rijkers


On 9/10/21 10:06 AM, Pavel Stehule wrote:

Hi

čt 9. 9. 2021 v 12:21 odesílatel Erik Rijkers  napsal:



Hi Pavel,

The patch applies and compiles fine but 'make check' for the
assert-enabled fails on 131 out of 210 tests.

(while compiling HEAD checks run without errors for both assert-disabled
and assert-enabled)




Please, check, attached patch. I fixed a routine for processing a list of
identifiers - now it works with the identifier's node more sensitive.
Previous implementation of strVal was more tolerant.


> [schema-variables-20210910.patch]

Apply, compile, make, & check(-world), and my small testsuite OK.

So all's well again - Ready for committer!

Thanks,

Erik Rijkers



Regards

Pavel

Re: Teach pg_receivewal to use lz4 compression

2021-09-10 Thread gkokolatos



‐‐‐ Original Message ‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier  wrote:

Hi,

please find v3 of the patch attached, rebased to the current head.

> > Michael Paquier wrote:
> >
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
> */
> This comment is incomplete.

Fixed.

> +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
> +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
> This is getting complicated. Would it be better to change this stuff
> and switch to a routine that checks if a segment has a valid name, is
> partial, and the type of compression that applied to it? It seems to
> me that we should group iszlibcompress and islz4compress together with
> the options available through compression_method.

Agreed and done.


> -   if (compresslevel != 0)
> -   {
> - if (compression_method == COMPRESSION_NONE)
> - {
> - compression_method = COMPRESSION_ZLIB;
> - }
> - if (compression_method != COMPRESSION_ZLIB)
> - {
> - pg_log_error("cannot use --compress when "
> -  "--compression-method is not gzip");
> - fprintf(stderr, _("Try \\"%s --help\\" for more 
> information.\\n"),
> - progname);
> - exit(1);
> - }
> -   }
>
> For compatibility where --compress enforces the use of zlib that would
> work, but this needs a comment explaining the goal of this block. I
> am wondering if it would be better to break the will and just complain
> when specifying --compress without --compression-method though. That
> would cause compatibility issues, but this would make folks aware of
> the presence of LZ4, which does not sound bad to me either as ZLIB is
> slower than LZ4 on all fronts.

Fair point. In v3 of the patch --compress requires --compression-method. Passing
0 as value errors out.

> -   else if (compression_method == COMPRESSION_ZLIB)
> -   {
> - pg_log_error("cannot use --compression-method gzip when "
> -  "--compression is 0");
> - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
> - progname);
> - exit(1);
> -   }
>
> Hmm. It would be more natural to enforce a default compression level
> in this case? The user is asking for a compression with zlib here.

Agreed. A default value of 5, which is in the middle point of options, has been
defined and used.

In addition, the tests have been adjusted to mimic the newly added gzip tests.


Cheers,
//Georgios


> ---
>
> Michael

v3-0001-Teach-pg_receivewal-to-use-lz4-compression.patch
Description: Binary data

Re: Schema variables - new implementation for Postgres 15

2021-09-10 Thread Pavel Stehule

Hi

čt 9. 9. 2021 v 12:21 odesílatel Erik Rijkers  napsal:

>  > [schema-variables-20210909.patch]
>
> Hi Pavel,
>
> The patch applies and compiles fine but 'make check' for the
> assert-enabled fails on 131 out of 210 tests.
>
> (while compiling HEAD checks run without errors for both assert-disabled
> and assert-enabled)
>
>

Please, check, attached patch. I fixed a routine for processing a list of
identifiers - now it works with the identifier's node more sensitive.
Previous implementation of strVal was more tolerant.

Regards

Pavel


>


schema-variables-20210910.patch.gz
Description: application/gzip

Re: Add jsonlog log_destination for JSON server logs

2021-09-10 Thread Michael Paquier

On Fri, Sep 10, 2021 at 01:07:00PM +0900, Michael Paquier wrote:
> Forking a bit this thread while looking at 0002 that adds new tests
> for csvlog.  While I agree that it would be useful to have more
> coverage with the syslogger message chunk protocol in this area, I
> think that having a separate test is a waste of resources.  Creating a
> new node is not cheap either, and this adds more wait phases, making
> the tests take longer.  It would be much better to extend
> 004_logrotate.pl and update it to use log_destination = 'stderr,
> csvlog', to minimize the number of nodes we create as well as the
> additional amount of time we'd spend for those tests.  Plugging in
> JSON into that would not be complicated either once we have in place a
> set of small routines that limit the code duplication between the
> checks for each log destination type.

And this part leads me to the attached, where the addition of the JSON
format would result in the addition of a couple of lines.
--
Michael
diff --git a/src/bin/pg_ctl/t/004_logrotate.pl 
b/src/bin/pg_ctl/t/004_logrotate.pl
index fa14b98c7d..aa0d64a4f7 100644
--- a/src/bin/pg_ctl/t/004_logrotate.pl
+++ b/src/bin/pg_ctl/t/004_logrotate.pl
@@ -6,15 +6,64 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 10;
 use Time::HiRes qw(usleep);
 
+# Extract the file name of a $format from the contents of
+# current_logfiles.
+sub fetch_file_name
+{
+   my $logfiles = shift;
+   my $format   = shift;
+   my @lines= split(/\n/, $logfiles);
+   my $filename = undef;
+   foreach my $line (@lines)
+   {
+   if ($line =~ /$format (.*)$/gm)
+   {
+   $filename = $1;
+   }
+   }
+
+   return $filename;
+}
+
+# Check for a pattern in the logs associated to one format.
+sub check_log_pattern
+{
+   my $format   = shift;
+   my $logfiles = shift;
+   my $pattern  = shift;
+   my $node = shift;
+   my $lfname   = fetch_file_name($logfiles, $format);
+
+   my $max_attempts = 180 * 10;
+
+   my $logcontents;
+   for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
+   {
+   $logcontents = slurp_file($node->data_dir . '/' . $lfname);
+   last if $logcontents =~ m/$pattern/;
+   usleep(100_000);
+   }
+
+   like($logcontents, qr/$pattern/,
+   "found expected log file content for $format");
+
+   # While we're at it, test pg_current_logfile() function
+   is( $node->safe_psql('postgres', "SELECT 
pg_current_logfile('$format')"),
+   $lfname,
+   "pg_current_logfile() gives correct answer with $format");
+   return;
+}
+
 # Set up node with logging collector
 my $node = PostgresNode->new('primary');
 $node->init();
 $node->append_conf(
'postgresql.conf', qq(
 logging_collector = on
+log_destination = 'stderr, csvlog'
 # these ensure stability of test results:
 log_rotation_age = 0
 lc_messages = 'C'
@@ -44,26 +93,12 @@ note "current_logfiles = $current_logfiles";
 
 like(
$current_logfiles,
-   qr|^stderr log/postgresql-.*log$|,
+   qr|^stderr log/postgresql-.*log
+csvlog log/postgresql-.*csv$|,
'current_logfiles is sane');
 
-my $lfname = $current_logfiles;
-$lfname =~ s/^stderr //;
-chomp $lfname;
-
-my $first_logfile;
-for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
-{
-   $first_logfile = slurp_file($node->data_dir . '/' . $lfname);
-   last if $first_logfile =~ m/division by zero/;
-   usleep(100_000);
-}
-
-like($first_logfile, qr/division by zero/, 'found expected log file content');
-
-# While we're at it, test pg_current_logfile() function
-is($node->safe_psql('postgres', "SELECT pg_current_logfile('stderr')"),
-   $lfname, 'pg_current_logfile() gives correct answer');
+check_log_pattern('stderr', $current_logfiles, 'division by zero', $node);
+check_log_pattern('csvlog', $current_logfiles, 'division by zero', $node);
 
 # Sleep 2 seconds and ask for log rotation; this should result in
 # output into a different log file name.
@@ -84,28 +119,14 @@ note "now current_logfiles = $new_current_logfiles";
 
 like(
$new_current_logfiles,
-   qr|^stderr log/postgresql-.*log$|,
+   qr|^stderr log/postgresql-.*log
+csvlog log/postgresql-.*csv$|,
'new current_logfiles is sane');
 
-$lfname = $new_current_logfiles;
-$lfname =~ s/^stderr //;
-chomp $lfname;
-
 # Verify that log output gets to this file, too
-
 $node->psql('postgres', 'fee fi fo fum');
 
-my $second_logfile;
-for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
-{
-   $second_logfile = slurp_file($node->data_dir . '/' . $lfname);
-   last if $second_logfile =~ m/syntax error/;
-   usleep(100_000);
-}
-
-like(
-   $second_logfile,
-   qr/syntax error/,
-   'found expected log file content in new log file');

Re: missing warning in pg_import_system_collations

2021-09-10 Thread Anton Voloshin


On 10/09/2021 01:37, Tom Lane wrote:

Another approach we could take is to deem the comment incorrect and
just remove it, codifying the current behavior of silently ignoring
unrecognized encodings.  The reason that seems like it might be
appropriate is that the logic immediately below this bit silently
ignores encodings that are known but are frontend-only:

 if (!PG_VALID_BE_ENCODING(enc))
 continue;/* ignore locales for client-only encodings */

It's sure not very clear to me why one case deserves a message and the
other not.  Perhaps they both do, which would lead to adding another
DEBUG1 message here.


I'm not an expert in locales, but I think it makes some sense to be 
silent about encodings we have consciously decided to ignore as we have 
them in our tables, but marked them as frontend-only 
(!PG_VALID_BE_ENCODING(enc)).
Just like it makes sense to do give a debug-level warning about 
encodings seen in locale -a output but not recognized by us at all 
(pg_get_encoding_from_locale(localebuf, false) < 0).


Therefore I think your patch with duplicated error message is better 
than what we have currently. I don't see how adding debug-level messages 
about skipping frontend-only encodings would be of any significant use here.


Unless someone more experienced in locales' subtleties would like to 
chime in.


--
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru

Re: parallelizing the archiver




> 10 сент. 2021 г., в 11:11, Julien Rouhaud  написал(а):
> 
> On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin  wrote:
>> 
>>> 10 сент. 2021 г., в 10:52, Julien Rouhaud  написал(а):
>>> 
>>> Yes, but it also means that it's up to every single archiving tool to
>>> implement a somewhat hackish parallel version of an archive_command,
>>> hoping that core won't break it.
>> I'm not proposing to remove existing archive_command. Just deprecate it 
>> one-WAL-per-call form.
> 
> Which is a big API beak.
Huge extension, not a break.

>> It's a very simplistic approach. If some GUC is set - archiver will just 
>> feed ready files to stdin of archive command. What fundamental design 
>> changes we need?
> 
> I'm talking about the commands themselves.  Your suggestion is to
> change archive_command to be able to spawn a daemon, and it looks like
> a totally different approach.  I'm not saying that having a daemon
> based approach to take care of archiving is a bad idea, I'm saying
> that trying to fit that with the current archive_command + some new
> GUC looks like a bad idea.
It fits nicely, even in corner cases. E.g. restore_command run from pg_rewind 
seems compatible with this approach.
One more example: after failover DBA can just ```ls|wal-g wal-push``` to 
archive all WALs unarchived before network partition.

This is simple yet powerful approach, without any contradiction to existing 
archive_command API.
Why it's a bad idea?

Best regards, Andrey Borodin.

Re: parallelizing the archiver

2021-09-10 Thread Julien Rouhaud

On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin  wrote:
>
> > 10 сент. 2021 г., в 10:52, Julien Rouhaud  написал(а):
> >
> > Yes, but it also means that it's up to every single archiving tool to
> > implement a somewhat hackish parallel version of an archive_command,
> > hoping that core won't break it.
> I'm not proposing to remove existing archive_command. Just deprecate it 
> one-WAL-per-call form.

Which is a big API beak.

> It's a very simplistic approach. If some GUC is set - archiver will just feed 
> ready files to stdin of archive command. What fundamental design changes we 
> need?

I'm talking about the commands themselves.  Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach.  I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.

Re: parallelizing the archiver