date:20220408

Re: failures in t/031_recovery_conflict.pl on CI

2022-04-08 Thread Andres Freund

Hi,

On 2022-04-08 21:55:15 -0700, Andres Freund wrote:
> on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which is
> interesting, because I ran it there dozens if not hundreds of times before
> commit, with - I think - only cosmetic changes.

Scratch that part - I found an instance of the freebsd failure earlier, just
didn't notice because that run failed for other reasons as well. So this might
just have uncovered an older bug around recovery conflict handling,
potentially platform dependent.

I guess I'll try to reproduce it on freebsd...

Greetings,

Andres Freund

failures in t/031_recovery_conflict.pl on CI

2022-04-08 Thread Andres Freund

Hi,

on CI [1] the new t/031_recovery_conflict.pl is failing occasionally. Which is
interesting, because I ran it there dozens if not hundreds of times before
commit, with - I think - only cosmetic changes.

I've reproduced it in a private branch, with more logging. And the results are
sure interesting.

https://cirrus-ci.com/task/6448492666159104
https://api.cirrus-ci.com/v1/artifact/task/6448492666159104/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log

The primary is waiting for 0/343A000 to be applied, which requires a recovery
conflict to be detected and resolved. On the standby there's the following
sequence (some omitted):

prerequisite for recovery conflict:
2022-04-09 04:05:31.292 UTC [35071][client backend] 
[031_recovery_conflict.pl][2/2:0] LOG:  statement: BEGIN;
2022-04-09 04:05:31.292 UTC [35071][client backend] 
[031_recovery_conflict.pl][2/2:0] LOG:  statement: DECLARE 
test_recovery_conflict_cursor CURSOR FOR SELECT b FROM 
test_recovery_conflict_table1;
2022-04-09 04:05:31.293 UTC [35071][client backend] 
[031_recovery_conflict.pl][2/2:0] LOG:  statement: FETCH FORWARD FROM 
test_recovery_conflict_cursor;

detecting the conflict:
2022-04-09 04:05:31.382 UTC [35038][startup] LOG:  recovery still waiting after 
28.821 ms: recovery conflict on buffer pin
2022-04-09 04:05:31.382 UTC [35038][startup] CONTEXT:  WAL redo at 0/3432800 
for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 1; blkref #0: rel 
1663/16385/16386, blk 0

and then nothing until the timeout:
2022-04-09 04:09:19.317 UTC [35035][postmaster] LOG:  received immediate 
shutdown request
2022-04-09 04:09:19.317 UTC [35035][postmaster] DEBUG:  sending signal 3 to 
process 35071
2022-04-09 04:09:19.320 UTC [35035][postmaster] DEBUG:  reaping dead processes
2022-04-09 04:09:19.320 UTC [35035][postmaster] DEBUG:  reaping dead processes
2022-04-09 04:09:19.320 UTC [35035][postmaster] DEBUG:  server process (PID 
35071) exited with exit code 2

Afaics that has to mean something is broken around sending, receiving or
processing of recovery conflict interrupts.


All the failures so far were on freebsd, from what I can see. There were other
failures in other tests, but I think for reverted or fixed things.


Except for not previously triggering while the shmstats patch was in
development, it's hard to tell whether it's a regression or just a
longstanding bug - we never had tests for recovery conflicts...


I don't really see how recovery prefetching could play a role here, clearly
we've been trying to replay the record. So we're elsewhere...

Greetings,

Andres Freund

https://cirrus-ci.com/github/postgres/postgres/master

Re: Add parameter jit_warn_above_fraction

2022-04-08 Thread Julien Rouhaud

Hi,

On Fri, Apr 08, 2022 at 09:39:18AM -0400, Stephen Frost wrote:
>
> * Magnus Hagander (mag...@hagander.net) wrote:
> > The addition to pg_stat_statements I pushed a short while ago would help
> > with that. But I think having a warning like this would also be useful. As
> > a stop-gap measure, yes, but we really don't know when we will have an
> > improved costing model for it. I hope you're right and that we can have it
> > by 16, and then I will definitely advocate for removing the warning again
> > if it works.
>
> Having this in pg_stat_statements is certainly helpful but having a
> warning also is.  I don't think we have to address this in only one way.
> A lot faster to flip this guc and then look in the logs on a busy system
> than to install pg_stat_statements, restart the cluster once you get
> permission to do so, and then query it.

+1, especially if you otherwise don't really need or want to have
pg_stat_statements enabled, as it's far from being free.  Sure you could enable
it by default with pg_stat_statements.track = none, but that seems a lot more
troublesome than just dynamically enabling a GUC, possibly for a short time
and/or for a specific database/role.

Re: Expose JIT counters/timing in pg_stat_statements

2022-04-08 Thread Julien Rouhaud

Hi,

On Sat, Apr 09, 2022 at 01:51:21AM +, Shinoda, Noriyoshi (PN Japan FSIP) 
wrote:
> Hi,
> thank you for the great features.
>
> The attached small patch changes the data type in the document.
> The following columns are actually double precision but bigint in the docs.
> jit_generation_time
> jit_inlining_time
> jit_optimization_time
> jit_emission_count

Indeed!  The patch looks good to me, I didn't find any other discrepancy.

Commitfest wrapup

2022-04-08 Thread Greg Stark

These remaining CF entries look like they're bugs that are maybe Open
Issues for release?

* fix spinlock contention in LogwrtResult

* Avoid erroring out when unable to remove or parse logical rewrite
files to save checkpoint work

* Add checkpoint and redo LSN to LogCheckpointEnd log message

* standby recovery fails when re-replaying due to missing directory
which was removed in previous replay.

* Logical replication failure "ERROR: could not map filenode
"base/13237/442428" to relation OID" with catalog modifying txns

* fix psql pattern handling

* Possible fails in pg_stat_statements test

* pg_receivewal fail to streams when the partial file to write is not
fully initialized present in the wal receiver directory

* Fix pg_rewind race condition just after promotion


Was the plan to commit this after feature freeze?

* pg_stat_statements: Track statement entry timestamp



A couple minor documentation, testing, and diagnostics patches that
may be committable even after feature freeze?

* Improve role attributes docs

* Reloption tests iprovement. Test resetting illegal option that was
actually set for some reason

* Make message at end-of-recovery less scary

* jit_warn_above_fraction parameter


And then there are the more devlish cases. I think some of these
patches are Rejected or Returned with Feedback but I'm not certain.
Some of them have split into multiple discussions or are partly
committed but still related work remains. Any opinions on what to do
with these?

* Simplify some RI checks to reduce SPI overhead

* Map WAL segment files on PMEM as WAL buffers

* Support custom authentication methods using hooks

* Implement INSERT SET syntax

* Logical insert/update/delete WAL records for custom table AMs


-- 
greg

cfbot requests

2022-04-08 Thread Justin Pryzby

I mentioned most/all of these ideas for cfbot at some point. I'm writing them
now so other people know about them and they're in once place.

- Keep the original patch series and commit messages, rather than squishing
them into a single commit with cfbot's own commit messages. Maybe append an
empty commit with cfbot's message, and include a parsable "base-branch: NNN"
commit hash. This supports some CI ideas like making HTML available for
patches touching doc/ (which needs to be relative to some other commit to show
only the changes rather than every *.html). And code coverage report for
changed files, which has the same requirement.

That *also* allows directly reviewing the original patch series with a branch
maintained by cfbot, preserving the patch set, with commit messages. See also:
https://www.postgresql.org/message-id/flat/CAKU4AWoU-P1zPS5hmiXpto6WGLOqk27VgCrxSKE2mgX%3DfypV6Q%40mail.gmail.com

Alternate idea: submit the patch to cirrus as a PR which makes the "base
branch" available to cirrus as an environment variable (note that PRs also
change a couple of other cirrus behaviors).

- I think cfbot already keeps track of historic CI build results (pass/fail
and link to cirrus). But I don't think cfbot exposes this yet. I know cirrus
can show history for a branch, but I can never find it, and their history is
limited. This would be like the buildfarm pages, ordered by time: one showing
all results, one showing all failures, and one showing all results for a given
patch. You could also consider retrieving the cirrus logs themselves, to allow
setting our own retention interval (we could ask cirrus if they'd want to allow
setting more aggressive expiration for logs/artifacts).

- HTML: sort by CF ID rather than alpha sort. Right now, commitfest entries
beginning with a capital letter sort first, which at least one person seems to
have discovered.

- HTML: add a "queued for CI" page showing the next patches to be submitted to
cirrus. These pages might allow queueing a re-run, too.

- HTML: show "target version" and "committer" (maybe committer name would be
shown in the existing list of names, but with another style applied). This
helps
to distinguish between patches which someone optimistically said was RFC and a
patch which a committer intends to commit, which ought to be pretty visible so
it's not lost in the mailing list and a surprise to anyone.

1 2 >

1 - 100 of 118 matches

Mail list logo