Re: pg_receivewal - couple of improvements

2022-02-05 Thread Bharath Rupireddy
On Sun, Feb 6, 2022 at 12:16 PM Michael Paquier  wrote:
>
> On Thu, Feb 03, 2022 at 10:01:42PM +0800, Julien Rouhaud wrote:
> > I don't get it.  If you're missing WAL it means that will you have to do 
> > that
> > tedious manual work to retrieve them no matter what.  So on top of that 
> > tedious
> > work, you also have to make sure that you don't provide a bogus start 
> > position.
>
> I may be wrong in saying that, but the primary use case I have seen
> for pg_receivewal is a service integration for archiving.

Not only for archiving, but it can also be used as a
synchronous/asynchronous standby.

> > Maybe that's a good idea but I'm still having a hard time imagining a 
> > scenario
> > where it would actually be a good idea.
>
> With the defaults that we have now in place (current LSN location,
> current slot's location or the archive location), I am not really
> convinced that we need more control in this area with the proposed
> option.

Having the start position as an option for pg_receivewal can be useful
in scenarios where the LSN/WAL file calculated from the pg_receivwal's
target directory is removed by the primary (for whatever reasons). In
such scenarios, we have to manually remove the WAL files(a risky thing
IMO) in the pg_receivwal's target directory so that the pg_receivewal
will calculate the next start position from the slot info and then
from the server position via IDENTIFY_SYSTEM command.

With the start position as an option, users can just provide the LSN
from which they want to stream the WAL, in the above case, it can be
from primary's latest checkpoint LSN.

I still feel that the start position as an option would be a good
addition here, so that users can choose the start position.

If others don't agree with the start position as an option, at least
having the current start position calculation (first, from
pg_receivewal's target directory, if not, from slot's restart_lsn, if
not, from server's identifiy_system command) as a user specified
option will be of great help. Users can say, 'calculate start position
from target directory or slot's restart_lsn or server's
identify_system command).

Regards,
Bharath Rupireddy.




Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Feb 06, 2022 at 01:58:21AM -0500, Tom Lane wrote:
>> As already mentioned, there's been no notice to buildfarm owners ...
>> so has Andrew actually made a release?

> There has been one as of 8 days ago:
> https://github.com/PGBuildFarm/client-code/releases

[ scrapes buildfarm logs ... ]

Not even Andrew's own buildfarm critters are using it, so
permit me leave to doubt that he thinks it's fully baked.

Andrew?

regards, tom lane




Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Michael Paquier
On Sun, Feb 06, 2022 at 01:58:21AM -0500, Tom Lane wrote:
> As already mentioned, there's been no notice to buildfarm owners ...
> so has Andrew actually made a release?

There has been one as of 8 days ago:
https://github.com/PGBuildFarm/client-code/releases

And I have just looked at that as point of reference.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Julien Rouhaud
On Sun, Feb 06, 2022 at 01:58:21AM -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > So, it took me some time to get back to this thread, and looked at it
> > for the last couple of days...  The buildfarm client v14 has been
> > released on the 29th of January, which means that we are good to go.
> 
> As already mentioned, there's been no notice to buildfarm owners ...
> so has Andrew actually made a release?

There's a v14 release on the github project ([1]) from 8 days ago, so it seems
so.

[1] https://github.com/PGBuildFarm/client-code/releases/tag/REL_14




Re: Ensure that STDERR is empty during connect_ok

2022-02-05 Thread Michael Paquier
On Wed, Feb 02, 2022 at 03:40:39PM +0100, Daniel Gustafsson wrote:
> As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by
> inspecting STDERR in connect_ok and require it to be empty.  This is not 
> really
> NSS specific, and could help find issues in other libraries as well so I
> propose to apply it regardless of the fate of the NSS patchset.

Sounds sensible from here.  Thanks!

All the code paths seem to be covered, at quick glance.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Tom Lane
Michael Paquier  writes:
> So, it took me some time to get back to this thread, and looked at it
> for the last couple of days...  The buildfarm client v14 has been
> released on the 29th of January, which means that we are good to go.

As already mentioned, there's been no notice to buildfarm owners ...
so has Andrew actually made a release?

regards, tom lane




Re: libpq async duplicate error results

2022-02-05 Thread Fabien COELHO



Hello Tom,


I concur with Fabien's analysis: we report the FATAL message from
the server during the first PQgetResult, and then the second call
discovers that the connection is gone and reports "server closed
the connection unexpectedly".  Those are two independent events
(in libpq's view anyway) so reporting them separately is correct,
or at least necessary unless you want to engage in seriously
major redesign and behavioral changes.

It is annoying that some of the text is duplicated in the second
report, but in the end that's cosmetic, and I'm not sure we can
improve it without breaking other things.  In particular, we
can't just think about what comes back in the PGgetResult calls,
we also have to consider what PQerrorMessage(conn) will report
afterwards.  So I don't think that resetting conn->errorMessage
during a PQgetResult series would be a good fix.

An idea that I don't have time to pursue right now is to track
how much of conn->errorMessage has been read out by PQgetResult,
and only report newly-added text in later PQgetResult calls of
a series, while PQerrorMessage would continue to report the
whole thing.  Buffer resets would occur where they do now.

It'd probably be necessary to make a user-visible PQgetResult
that becomes a wrapper around PQgetResultInternal, because
internal callers such as PQexecFinish will need the old behavior,
or at least some of them will.


I agree that the message reset is not convincing, but it was the only way 
to get the expected behavior without changing the existing functions.


I see two paths:

(1) keep the duplicate message in this corner case.

(2) do the hocus pocus you suggest around PQerrorMessage and PQgetResult 
to work around the duplication, just for this corner case.


I'd tend to choose (1) to keep it simple, even if (2) is feasible.

--
Fabien.




Re: 2022-01 Commitfest

2022-02-05 Thread Julien Rouhaud
Hi,

On Sun, Feb 06, 2022 at 03:49:50PM +0900, Michael Paquier wrote:
> On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote:
> > Agreed, we're not here to cause make-work for submitters.  RWF is
> > appropriate if the patch has been in Waiting On Author for awhile
> > and doesn't seem to be going anywhere, but otherwise we should
> > just punt it to the next CF.
> 
> FWIW, I just apply a two-week rule here, as of half the commit fest
> period to let people the time to react:
> - If a patch has been waiting on author since the 15th of January,
> mark it as RwF.
> - If it has been left as waiting on author after the 15th of January,
> move it to the next CF.

Thanks.  Note that I was planning to do that on Monday, as it didn't seemed
rushed enough to spend time on it during the weekend.




Re: Ensure that STDERR is empty during connect_ok

2022-02-05 Thread Michael Paquier
On Wed, Feb 02, 2022 at 04:42:09PM +0100, Daniel Gustafsson wrote:
> Disclaimer: longer term I would prefer to remove test plan counting like above
> and Toms comment downthread.

That would be really nice.
--
Michael


signature.asc
Description: PGP signature


Re: 2022-01 Commitfest

2022-02-05 Thread Michael Paquier
On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote:
> Agreed, we're not here to cause make-work for submitters.  RWF is
> appropriate if the patch has been in Waiting On Author for awhile
> and doesn't seem to be going anywhere, but otherwise we should
> just punt it to the next CF.

FWIW, I just apply a two-week rule here, as of half the commit fest
period to let people the time to react:
- If a patch has been waiting on author since the 15th of January,
mark it as RwF.
- If it has been left as waiting on author after the 15th of January,
move it to the next CF.

> Anyway, thanks to Julien for doing this mostly-thankless task
> this time!

+1.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal - couple of improvements

2022-02-05 Thread Michael Paquier
On Thu, Feb 03, 2022 at 10:01:42PM +0800, Julien Rouhaud wrote:
> I don't get it.  If you're missing WAL it means that will you have to do that
> tedious manual work to retrieve them no matter what.  So on top of that 
> tedious
> work, you also have to make sure that you don't provide a bogus start 
> position.

I may be wrong in saying that, but the primary use case I have seen
for pg_receivewal is a service integration for archiving.

> Maybe that's a good idea but I'm still having a hard time imagining a scenario
> where it would actually be a good idea.

With the defaults that we have now in place (current LSN location,
current slot's location or the archive location), I am not really
convinced that we need more control in this area with the proposed
option.
--
Michael


signature.asc
Description: PGP signature


Re: Windows now has fdatasync()

2022-02-05 Thread Michael Paquier
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
> I tried out a quick POC patch and it runs a bit faster than fsync(), as
> expected.

Good news, as a too high difference would be suspect :)

How much difference does it make in % and are the numbers rather
reproducible?  Just wondering..
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Michael Paquier
On Sun, Feb 06, 2022 at 02:03:44PM +0800, Julien Rouhaud wrote:
> I didn't follow that thread closely, but if having the latest buildfarm client
> version installed is a hard requirement this will likely be a problem.  First,
> there was no email to warn buildfarm owners that a new version is available,
> and even if there was I doubt that every owner would have updated it since.
> Especially since this is the lunar new year period, so at least 2 buildfarm
> owners (me included) are on holidays since last week.

The buildfarm will still be able to work as it did so that's not a
hard requirement per-se.  The only thing changing is that we would not
find the logs in the event of a failure in the tests of pg_upgrade,
and the buildfarm client is coded to never fail if it does not see
logs in some of the paths it looks at, it just holds the full history
of the paths we have used across the ages.
--
Michael


signature.asc
Description: PGP signature


Re: Unclear problem reports

2022-02-05 Thread Bharath Rupireddy
On Sun, Feb 6, 2022 at 5:28 AM Noah Misch  wrote:
>
> > > - Canned responses are noise to every list reader other than the OP.
> > > - Canned responses make the list feel like a sales funnel rather than a 
> > > normal
> > >   free software mailing list.
> >
> > All bug reports posted through the bug form gets moderated before
> > they're passed through. Moderators also have access to a set of
> > pre-defined canned responses (such as the example of where they should
> > go to report pgadmin bugs). It will also catch posts made directly to
> > -bugs by people who are not subscribed, but people who are subscribed
> > will not have their posts moderated by default.
> >
> > If we're talking about people submitting bug reports, åperhaps a lot
> > can be done with (1) a couple of more such responses and (2) more
> > clear instructions fo the moderators of when they are supposed to use
> > them.
> >
> > Error on the side of letting them through is probably always the best
> > choice, but if it's clear that it can be better handled by a canned
> > response, we do have an actual system for that.
>
> I was referring to David G. Johnston's proposal to send canned email responses
> when a thread (presumably containing a not-obviously-unreasonable PostgreSQL
> question or report) gets no replies in N days.  The proposed email directed
> the user toward paid professional services.  Sending such a message at
> moderation time would solve the noise problem, but it would make the "sales
> funnel" problem worse.  (Also, I figure moderators won't know at moderation
> time which messages will get zero replies.)  For which part of $SUBJECT did
> you envision using new moderator responses?

IMO, having a postgresql wiki page describing the steps to debug or
perhaps fix the most common problems/bugs is a good idea. Of course,
adding the steps to wiki and maintaining it takes some effort from the
experts here. But this is better than having canned email responses
for bug reports which requires more effort than a wiki page. Today,
there are a lot of outside resources listing the steps to fix/debug a
postgres related issue. The users can always refer to the wiki page
and try out things on their own, if the steps don't fix the issue,
they can always reach out via postgresql bugs list.

I also agree with the point that improving some of the most common
error messages with hints on what to do next to fix the issues.

Regards,
Bharath Rupireddy.




Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Julien Rouhaud
Hi,

On Sun, Feb 06, 2022 at 01:36:07PM +0900, Michael Paquier wrote:
> 
> The buildfarm client v14 has been
> released on the 29th of January, which means that we are good to go.

I didn't follow that thread closely, but if having the latest buildfarm client
version installed is a hard requirement this will likely be a problem.  First,
there was no email to warn buildfarm owners that a new version is available,
and even if there was I doubt that every owner would have updated it since.
Especially since this is the lunar new year period, so at least 2 buildfarm
owners (me included) are on holidays since last week.




Re: GUC flags

2022-02-05 Thread Michael Paquier
On Mon, Jan 31, 2022 at 04:56:45PM -0600, Justin Pryzby wrote:
> I'm not clear on what things are required/prohibited to allow/expect
> "installcheck" to pass.  It's possible that postgresql.conf doesn't even exist
> in the data dir, right ?

There are no written instructions AFAIK, but I have as personal rule
to not break the tests in configurations where they worked
previously.

> It's okay with me if the config_file-reading stuff isn't re-implemented.

Actually, I am thinking that we should implement it before retiring
completely check_guc, but not in the fashion you are suggesting.  I
would be tempted to add something in the TAP tests as of
src/test/misc/, where we initialize an instance to get the information
about all the GUCs from SQL, and map that to the sample file located
at pg_config --sharedir.  I actually have in my patch set for
pg_upgrade's TAP a perl routine that could be used for this purpose,
as of the following in Cluster.pm:

+=item $node->config_data($option)
+
+Grab some data from pg_config, with $option being the command switch
+used.
+
+=cut
+
+sub config_data
+{
+   my ($self, $option) = @_;
+   local %ENV = $self->_get_env();
+
+   my ($stdout, $stderr);
+   my $result =
+ IPC::Run::run [ $self->installed_command('pg_config'), $option
],
+ '>', \$stdout, '2>', \$stderr
+ or die "could not execute pg_config";
+   chomp($stdout);
+   $stdout =~ s/\r$//;
+
+   return $stdout;
+}

What do you think?  (I was thinking about applying that separately
anyway, to lower the load of the pg_upgrade patch a bit.)
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade should truncate/remove its logs before running

2022-02-05 Thread Michael Paquier
On Sat, Jan 29, 2022 at 09:53:25AM +0900, Michael Paquier wrote:
> On Fri, Jan 28, 2022 at 06:27:29PM -0500, Andrew Dunstan wrote:
>> I have committed this. But it will take time to get every buildfarm own
>> to upgrade.
> 
> Thanks for that.

So, it took me some time to get back to this thread, and looked at it
for the last couple of days...  The buildfarm client v14 has been
released on the 29th of January, which means that we are good to go.

I have found one issue while reviewing things: the creation of the new
subdirectory and its contents should satisfy group permissions for the
new cluster's data folder, but we were not doing that properly as we
called GetDataDirectoryCreatePerm() after make_outputdirs() so we
missed the proper values for create_mode and umask().  The rest looked
fine, and I got a green CI run on my own repo.  Hence, applied.

I'll keep an eye on the buildfarm, in case.
--
Michael


signature.asc
Description: PGP signature


Re: Adding CI to our tree

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-03 23:04:04 -0600, Justin Pryzby wrote:
> > I assume this doesn't yet work to a meaningful degree? Last time I checked
> > there were quite a few tests that needed to be invoked in a specific
> > directory.
> 
> It works - tap_check() does chdir().

Ah, I thought you'd implemented a target that does it all in one prove
invocation...


> It currently fails in 027_stream_regress.pl, although I keep hoping that it
> had been fixed...

That's likely because you're not setting REGRESS_OUTPUTDIR like
src/test/recovery/Makefile and recoverycheck() are doing.


Greetings,

Andres Freund




Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-05 Thread Andy Fan
Hi,

On Sat, Feb 5, 2022 at 9:32 PM Tomas Vondra 
wrote:

>
> I'm also not claiming this is 100% worth it - queries with a suitable
> combination of clauses (conditions on the join keys) seems rather
> uncommon.


Thanks for showing interest in this. I want to add some other user cases
which seem not very uncommon.   a).  When we join the key on a foregin
table, in which case,  push down a qual to foregin key would be pretty
good to reduce the data transformed from the network.  b).  If the people
join many partitioned table on partitioned key,  but they want to query
more than 1 partitions (which means the qual on partition key is not a
simple "partitionKey = Const"),  then we have to do a run-time partition
prune (lose the chance for initial partition prune).  We have big difference
on the performance aspect as well.

I guess some of the people who think we may need this feature are not very
clear about what bad it would be if we add this feature (Of course Including
me).  I summarized the discussion before and hacked the solution at [1],
the
current state looks reasonable to me.   I'm not sure if I missed any point.

> Of course, this breaks the estimates in the faster query, because we now
> apply the condition twice - once for the index scan, one as the join
> clause. So instead of ~100k rows the join is estimated as ~1000 rows.

I think my patch has addressed this. Here is the example:

postgres=# set geqo to off;  -- disable this feature, we have an estimation
error.
 -- using geqo guc in patch is
just for easy testing.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
  WHERE (t1.a > 99000) and t2.a > 99000;
 QUERY PLAN

 Merge Join  (cost=0.73..2408.37 rows=990 width=8)
 (actual time=0.032..21.350 rows=99900 loops=1)
   Merge Cond: (t1.a = t2.a)
   ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..29.64 rows=991
width=4)
 (actual time=0.014..0.121
rows=1000 loops=1)
 Index Cond: (a > 99000)
 Heap Fetches: 0
   ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..2113.20
rows=101301 width=4)
  (actual time=0.013..9.854
rows=99900 loops=1)
 Index Cond: (a > 99000)
 Heap Fetches: 0
 Planning Time: 0.282 ms
 Execution Time: 24.823 ms
(10 rows)


postgres=# set geqo to on;  -- enable this feature and let planner derive
the qual by itself, the estimation
 -- is good.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
  WHERE (t1.a > 99000) ;
 QUERY PLAN

 Merge Join  (cost=0.73..2408.37 rows=97680 width=8)
 (actual time=0.031..21.296 rows=99900 loops=1)
   Merge Cond: (t1.a = t2.a)
   ->  Index Only Scan using t1_a_idx on t1  (cost=0.29..29.64 rows=991
width=4)
 (actual time=0.014..0.116
rows=1000 loops=1)
 Index Cond: (a > 99000)
 Heap Fetches: 0
   ->  Index Only Scan using t2_a_idx on t2  (cost=0.43..2113.20
rows=101301 width=4)
  (actual time=0.012..9.751
rows=99900 loops=1)
 Index Cond: (a > 99000)
 Heap Fetches: 0
 Planning Time: 0.269 ms
 Execution Time: 24.749 ms
(10 rows)


So I think knowing what bad it is to have this feature is the key point to
discussion now.

[1]
https://www.postgresql.org/message-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5%2BDVwyP-CrHCQg%40mail.gmail.com


-- 
Best Regards
Andy Fan


Re: Unclear problem reports

2022-02-05 Thread Noah Misch
On Sun, Feb 06, 2022 at 12:41:52AM +0100, Magnus Hagander wrote:
> On Sun, Feb 6, 2022 at 12:02 AM Noah Misch  wrote:
> > On Fri, Feb 04, 2022 at 01:36:46PM -0500, Bruce Momjian wrote:
> > > On Wed, Feb  2, 2022 at 07:21:19PM -0700, David G. Johnston wrote:
> > > > Have pg_lister queue up a check for, say, two or three days after the 
> > > > bug
> > > > reporting form is filled out.  If the report hasn't been responded to by
> > > > someone other than the OP send out a reply that basically says:
> > > >
> > > > We're sorry your message hasn't yet attracted a response.  If your 
> > > > report falls
> > > > into the category of "tech support for a malfunctioning server", and 
> > > > this
> > > > includes error messages that are difficult or impossible to replicate 
> > > > in a
> > > > development environment (maybe give some examples), you may wish to 
> > > > consider
> > > > eliciting paid professional help.  Please see this page on our website 
> > > > for a
> > > > directory of companies that provide such services.  The PostgreSQL Core 
> > > > Project
> > > > itself refrains from making recommendations since many, if not all, of 
> > > > these
> > > > companies contribute back to the project in order to keep it both free 
> > > > and open
> > > > source.
> > >
> > > Yes, that is an idea.  I have canned email responses for common issues
> > > like PGAdmin questions on the bugs list, but for these cases, I don't
> > > know if someone might actually know the answer, and I don't know how
> > > long to wait for an answer.  Should we be going the other way and state
> > > on the bugs submission page, 
> > > https://www.postgresql.org/account/submitbug/:
> > >
> > >   If you are having a serious problem with the software and do not
> > >   receive a reply, consider additional support channels, including
> > >   professional support (https://www.postgresql.org/support/).
> >
> > https://www.postgresql.org/account/submitbug/ does say to "ensure you have
> > read" https://www.postgresql.org/docs/current/bug-reporting.html, which 
> > says,
> > "If you need help immediately, consider obtaining a commercial support
> > contract."  Hence, this expands on an existing message and makes it more
> > prominent.  I think that's reasonable, though I'm not sure it's a net win.
> > One could also edit the page that appears after submitting a bug.  Editing a
> > web page is superior to using canned email responses, for two reasons:
> >
> > - Canned responses are noise to every list reader other than the OP.
> > - Canned responses make the list feel like a sales funnel rather than a 
> > normal
> >   free software mailing list.
> 
> All bug reports posted through the bug form gets moderated before
> they're passed through. Moderators also have access to a set of
> pre-defined canned responses (such as the example of where they should
> go to report pgadmin bugs). It will also catch posts made directly to
> -bugs by people who are not subscribed, but people who are subscribed
> will not have their posts moderated by default.
> 
> If we're talking about people submitting bug reports, åperhaps a lot
> can be done with (1) a couple of more such responses and (2) more
> clear instructions fo the moderators of when they are supposed to use
> them.
> 
> Error on the side of letting them through is probably always the best
> choice, but if it's clear that it can be better handled by a canned
> response, we do have an actual system for that.

I was referring to David G. Johnston's proposal to send canned email responses
when a thread (presumably containing a not-obviously-unreasonable PostgreSQL
question or report) gets no replies in N days.  The proposed email directed
the user toward paid professional services.  Sending such a message at
moderation time would solve the noise problem, but it would make the "sales
funnel" problem worse.  (Also, I figure moderators won't know at moderation
time which messages will get zero replies.)  For which part of $SUBJECT did
you envision using new moderator responses?




Re: Unclear problem reports

2022-02-05 Thread Magnus Hagander
On Sun, Feb 6, 2022 at 12:02 AM Noah Misch  wrote:
>
> On Fri, Feb 04, 2022 at 01:36:46PM -0500, Bruce Momjian wrote:
> > On Wed, Feb  2, 2022 at 07:21:19PM -0700, David G. Johnston wrote:
> > > On Wed, Feb 2, 2022 at 5:35 PM Bruce Momjian  wrote:
> > > Have pg_lister queue up a check for, say, two or three days after the bug
> > > reporting form is filled out.  If the report hasn't been responded to by
> > > someone other than the OP send out a reply that basically says:
> > >
> > > We're sorry your message hasn't yet attracted a response.  If your report 
> > > falls
> > > into the category of "tech support for a malfunctioning server", and this
> > > includes error messages that are difficult or impossible to replicate in a
> > > development environment (maybe give some examples), you may wish to 
> > > consider
> > > eliciting paid professional help.  Please see this page on our website 
> > > for a
> > > directory of companies that provide such services.  The PostgreSQL Core 
> > > Project
> > > itself refrains from making recommendations since many, if not all, of 
> > > these
> > > companies contribute back to the project in order to keep it both free 
> > > and open
> > > source.
> >
> > Yes, that is an idea.  I have canned email responses for common issues
> > like PGAdmin questions on the bugs list, but for these cases, I don't
> > know if someone might actually know the answer, and I don't know how
> > long to wait for an answer.  Should we be going the other way and state
> > on the bugs submission page, https://www.postgresql.org/account/submitbug/:
> >
> >   If you are having a serious problem with the software and do not
> >   receive a reply, consider additional support channels, including
> >   professional support (https://www.postgresql.org/support/).
>
> https://www.postgresql.org/account/submitbug/ does say to "ensure you have
> read" https://www.postgresql.org/docs/current/bug-reporting.html, which says,
> "If you need help immediately, consider obtaining a commercial support
> contract."  Hence, this expands on an existing message and makes it more
> prominent.  I think that's reasonable, though I'm not sure it's a net win.
> One could also edit the page that appears after submitting a bug.  Editing a
> web page is superior to using canned email responses, for two reasons:
>
> - Canned responses are noise to every list reader other than the OP.
> - Canned responses make the list feel like a sales funnel rather than a normal
>   free software mailing list.

All bug reports posted through the bug form gets moderated before
they're passed through. Moderators also have access to a set of
pre-defined canned responses (such as the example of where they should
go to report pgadmin bugs). It will also catch posts made directly to
-bugs by people who are not subscribed, but people who are subscribed
will not have their posts moderated by default.

If we're talking about people submitting bug reports, åperhaps a lot
can be done with (1) a couple of more such responses and (2) more
clear instructions fo the moderators of when they are supposed to use
them.

Error on the side of letting them through is probably always the best
choice, but if it's clear that it can be better handled by a canned
response, we do have an actual system for that.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: [BUG]Update Toast data failure in logical replication

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-04 17:45:36 +0530, Amit Kapila wrote:
> diff --git a/contrib/test_decoding/expected/toast.out 
> b/contrib/test_decoding/expected/toast.out
> index cd03e9d..a757e7d 100644
> --- a/contrib/test_decoding/expected/toast.out
> +++ b/contrib/test_decoding/expected/toast.out
> @@ -77,7 +77,7 @@ SELECT substr(data, 1, 200) FROM 
> pg_logical_slot_get_changes('regression_slot',
>   table public.toasted_key: INSERT: id[integer]:1 
> toasted_key[text]:'1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
>   COMMIT
>   BEGIN
> - table public.toasted_key: UPDATE: id[integer]:1 
> toasted_key[text]:unchanged-toast-datum 
> toasted_col1[text]:unchanged-toast-datum 
> toasted_col2[text]:'987654321098765432109876543210987654321098765432109
> + table public.toasted_key: UPDATE: old-key: 
> toasted_key[text]:'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678

Hm. This looks weird. What happened to the the change to toasted_col2 that was
in the "removed" line?

This corresponds to the following statement I think:
-- test update of a toasted key without changing it
UPDATE toasted_key SET toasted_col2 = toasted_col1;
which previously was inserted as:
INSERT INTO toasted_key(toasted_key, toasted_col1) VALUES(repeat('1234567890', 
200), repeat('9876543210', 200));

so toasted_col2 should have changed from NULL to repeat('9876543210', 200)


Am I misreading something?


Greetings,

Andres Freund




Re: Unclear problem reports

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-02 19:35:36 -0500, Bruce Momjian wrote:
> The Postgres community is great at diagnosing problems and giving users
> feedback.  In most cases, we can either diagnose a problem and give a
> fix, or at least give users a hint at finding the cause.
>
> However, there is a class of problems that are very hard to help with,
> and I have perhaps seen an increasing number of them recently, e.g.:
>
>   
> https://www.postgresql.org/message-id/17384-f50f2eedf541e512%40postgresql.org
>   
> https://www.postgresql.org/message-id/CALTnk7uOrMztNfzjNOZe3TdquAXDPD3vZKjWFWj%3D-Fv-gmROUQ%40mail.gmail.com
>
> I consider these as problems that need digging to find the cause, and
> users are usually unable to do sufficient digging, and we don't have
> time to give them instructions, so they never get a reply.
>
> Is there something we can do to improve this situation?

I think some of this can be improved measurably, with moderate effort, by
improving our error messages / diagnostics.


Taking e.g. the second report:

[7804] LOG:  starting PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit
[8812] LOG:  invalid primary checkpoint record
[8812] PANIC:  could not locate a valid checkpoint record
[7804] LOG:  startup process (PID 8812) was terminated by exception 0xC409
[7804] HINT:  See C include file "ntstatus.h" for a description of the
hexadecimal value.
[7804] LOG:  aborting startup due to startup process failure
[7804] LOG:  database system is shut down

We don't provide any details in this log report that allow to diagnose the
problem:
"invalid primary checkpoint record" doesn't say *why* it's invalid, nor does
it say the location at which the checkpoint record was, nor whether the
cluster was shutdown gracefully before. It could be that the permissions on
the WAL files were wrong, it could be missing files, actually invalid WAL, ...

The while following bit about "startup process ... was terminated" and it's
HINT is useless information, because we actually "knowingly" caused that
PANIC.  Even disregarding that, it certainly doesn't help to list numerical
exception codes and references to ntstatus.h.


The first report is similar, although it's a bit harder to improve:
ERROR: missing chunk number 0 for toast value 152073604 in pg_toast_2619

Easy to fix: We don't mention the table that pg_toast_2619 corresponds to -
something we can determine, rather forcing the user to figure out how to do
so.

Harder to fix: We don't provide information about where the reference to the
toast value comes from. In general we don't necessarily know that, because
Datum's are handed around fairly widely. But in a lot of cases we actually
could know.


> Should we just tell them they need to hire a Postgres expert?  I assume
> these are users who do not already have access to such experts.

I think these are more our fault than our users :(

Greetings,

Andres Freund




Re: Unclear problem reports

2022-02-05 Thread Noah Misch
On Fri, Feb 04, 2022 at 01:36:46PM -0500, Bruce Momjian wrote:
> On Wed, Feb  2, 2022 at 07:21:19PM -0700, David G. Johnston wrote:
> > On Wed, Feb 2, 2022 at 5:35 PM Bruce Momjian  wrote:
> > Have pg_lister queue up a check for, say, two or three days after the bug
> > reporting form is filled out.  If the report hasn't been responded to by
> > someone other than the OP send out a reply that basically says:
> > 
> > We're sorry your message hasn't yet attracted a response.  If your report 
> > falls
> > into the category of "tech support for a malfunctioning server", and this
> > includes error messages that are difficult or impossible to replicate in a
> > development environment (maybe give some examples), you may wish to consider
> > eliciting paid professional help.  Please see this page on our website for a
> > directory of companies that provide such services.  The PostgreSQL Core 
> > Project
> > itself refrains from making recommendations since many, if not all, of these
> > companies contribute back to the project in order to keep it both free and 
> > open
> > source.
> 
> Yes, that is an idea.  I have canned email responses for common issues
> like PGAdmin questions on the bugs list, but for these cases, I don't
> know if someone might actually know the answer, and I don't know how
> long to wait for an answer.  Should we be going the other way and state
> on the bugs submission page, https://www.postgresql.org/account/submitbug/:
> 
>   If you are having a serious problem with the software and do not
>   receive a reply, consider additional support channels, including
>   professional support (https://www.postgresql.org/support/).

https://www.postgresql.org/account/submitbug/ does say to "ensure you have
read" https://www.postgresql.org/docs/current/bug-reporting.html, which says,
"If you need help immediately, consider obtaining a commercial support
contract."  Hence, this expands on an existing message and makes it more
prominent.  I think that's reasonable, though I'm not sure it's a net win.
One could also edit the page that appears after submitting a bug.  Editing a
web page is superior to using canned email responses, for two reasons:

- Canned responses are noise to every list reader other than the OP.
- Canned responses make the list feel like a sales funnel rather than a normal
  free software mailing list.


It's safe to say $SUBJECT won't have a general solution.  There will always be
some vanguard of tough user experiences, each one implying a separate project.
For example, Julien replied to the "could not locate a valid checkpoint
record" -bugs thread with some followup questions, but one could improve the
log messages to answer those questions.  Where we don't know how to improve
the product enough, wiki pages (Julien suggested this) seem good.




Re: Release notes for February minor releases

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-04 14:58:59 -0500, Tom Lane wrote:
> I've pushed the first draft for $SUBJECT at
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ab22eea83169c8d0eb15050ce61cbe3d7dae4de6

+Author: Andres Freund 
+Branch: master [18b87b201] 2022-01-13 18:13:41 -0800
+Branch: REL_14_STABLE [dad1539ae] 2022-01-14 10:56:12 -0800
+-->
+ 
+  Fix corruption of HOT chains when a RECENTLY_DEAD tuple changes
+  state to fully DEAD during page pruning (Andres Freund)
+ 

Even if that happens, it's still pretty unlikely to cause corruption - so
maybe s/corruption/chance of corruption/?


+ 
+  This happens when the last transaction that could see
+  the tuple ends while the page is being pruned.

The transaction doesn't need to have ended while the page is vacuumed - the
horizon needs to have been "refined/updated" while the page is pruned so that
a tuple version that was first considered RECENTLY_DEAD is now considered
DEAD.  Which can only happen if RecentXmin changed after
vacuum_set_xid_limits(), which only can happen if catalog snapshot
invalidations and other invalidations are processed in vac_open_indexes() and
RecentXmin changed since vacuum_set_xid_limits().  Then a page involving
tuples in a specific "arrangement" need to be encountered.

That's obviously to complicated for the release notes. Trying to make it more
understandable I came up with the following, which still does not seem great:

This can only happen if transactions, some having performed DDL, commit
within a narrow window at the start of VACUUM. If VACUUM then prunes a
page containing several tuple version that started to be removable within
the aforementioned time window, the bug may cause corruption on that page
(but no further pages). A tuple that is pointed to by a redirect item
elsewhere on the page can get removed. [...]


Greetings,

Andres Freund




Re: Release notes for February minor releases

2022-02-05 Thread Andres Freund
Hi,

On 2022-02-04 22:27:54 +0100, Michael Banck wrote:
> On Fri, Feb 04, 2022 at 02:58:59PM -0500, Tom Lane wrote:
> > I've pushed the first draft for $SUBJECT at
> > 
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ab22eea83169c8d0eb15050ce61cbe3d7dae4de6
> > 
> > Please send comments/corrections by Sunday.
> 
> > + 
> > +  Fix corruption of HOT chains when a RECENTLY_DEAD tuple changes
> > +  state to fully DEAD during page pruning (Andres Freund)
> > + 
> > +
> > + 
> > +  This happens when the last transaction that could see
> > +  the tuple ends while the page is being pruned.  It was then possible
> > +  to remove a tuple that is pointed to by a redirect item elsewhere on
> > +  the page.  While that causes no immediate problem, when the item slot
> > +  is re-used by some new tuple, that tuple would be thought to be part
> > +  of the pre-existing HOT chain, creating a form of index corruption.
> 
> Well, ouchy.

I don't think the above description is quite accurate / makes it sound much
easier to hit than it is.

The time window in which the stars need to align badly is not per-page window,
but per-vacuum. And the window is very narrow. Even if that prerequisite was
fulfilled, one additionally needs to encounter a pretty rare combination of
tids of very specific xid "ages".




> > +  If this seems to have affected a table, REINDEX
> > +  should repair the damage.
> 
> I don't think this is very helpful to the reader, are their indexes
> corrupt or not? If we can't tell them a specific command to run to
> check, can we at least mention that running amcheck would detect that
> (if it actually does)?

It does not reliably. Unfortunately heap amcheck does not verify HOT chains to
any meaningful degree. Nor does btree amcheck check whether index tuples point
to matching heap tuples :(


> Otherwise, I guess the only way to be sure is to
> just reindex every index? Or is this at least specific to b-trees?

It's an issue on the heap side, so unfortunately it is not btree specific.

Greetings,

Andres Freund




Re: Synchronizing slots from primary to standby

2022-02-05 Thread Andres Freund
Hi,

On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Mon, 3 Jan 2022 14:43:36 +0100
> Subject: [PATCH v3] Synchronize logical replication slots from primary to
>  standby

I've just skimmed the patch and the related threads. As far as I can tell this
cannot be safely used without the conflict handling in [1], is that correct?


Greetings,

Andres Freund

[1] 
https://postgr.es/m/CA%2BTgmoZd-JqNL1-R3RJ0jQRD%2B-dc94X0nPJgh%2BdwdDF0rFuE3g%40mail.gmail.com




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-02-05 Thread Bharath Rupireddy
On Tue, Feb 1, 2022 at 3:10 AM Greg Stark  wrote:
>
> So I looked at this patch and I have the same basic question as Bruce.

Thanks a lot for the comments.

> Do we really want to expose every binary tool associated with Postgres
> as an extension? Obviously this is tempting for cloud provider users
> which is not an unreasonable argument. But it does have consequences.

Perhaps not every tool needs to be exposed, but given the advantages
that pg_walinspect can provide it's a good candidate to have it as a
core extension. Some of the advantages are - debugging, WAL analysis,
feeding WAL stats and info to dashboards to show customers and answer
their queries, RCA etc., for educational purposes - one can understand
the WAL structure, stats, different record types etc. Another nice
thing is getting raw WAL data out of the running server (of course all
the users can't get it only the allowed ones, currently users with
pg_monitor role, if required we can change it to some other predefined
role). For instance, the raw WAL data can be fed to external page
repair tools to apply on a raw page (one can get this from pageinspect
extension).

> 1) Some things like pg_waldump are running code that is not normally
> under user control. This could have security issues or reliability
> issues.

I understand this and also I think the same concern applies to
pageinspect tool which exposes getting raw page data. The
pg_walinspect functions are currently accessible by the users with
pg_monitor role, if required we can change this to some other
predefined role.

> On that front I'm especially concerned that pg_verify_raw_wal_record()
> for example would let an attacker feed arbitrary hand crafted xlog
> records into the parser which is not normally something a user can do.
> If they feed it something it's not expecting it might be easy to cause
> a crash and server restart.

This function does nothing (no writes) to the server but just checking
the CRC of the WAL record. If at all one can make the server crash
with an input, then that should be a problem with the server code
which needs to be fixed. But IMO this function doesn't have a concern
as such.

> There's also a bit of concern about data retention. Generally in
> Postgres when rows are deleted there's very weak guarantees about the
> data really being wiped. We definitely don't wipe it from disk of
> course. And things like pageinspect could expose it long after it's
> been deleted. But one might imagine after pageinspect shows it's gone
> and/or after a vacuum full the data is actually purged. But then
> something like pg_walinspect would make even that insufficient.

The idea of pg_walinspect is to get the WAL info, data and stats out
of a running postgres server, if the WAL isn't available, the
functions would say that.

> 2) There's no documentation. I'm guessing you hesitated to write
> documentation until the interface is settled but actually sometimes
> writing documentation helps expose things in the interface that look
> strange when you try to explain them.

I will send out the new patch set with documentation soon.

> 3) And the interface does look a bit strange. Like what's the deal
> with pg_get_wal_record_info_2() ? I gather it's just a SRF version of
> pg_get_wal_record_info() but that's a strange name. And then what's
> the point of pg_get_wal_record_info() at all? Why wouldn't the SRF be
> sufficient even for the specific case of a single record?

I agree, pg_get_wal_record_info_2 is a poor naming.
pg_get_wal_record_info_2 takes range of LSN (start and end) to give
the wal info, whereas pg_get_wal_record_info just takes one LSN. Maybe
I will change pg_get_wal_record_info_2 to pg_get_wal_record_info_range
or pg_get_wal_records_info or someother namign is better? If the
suggestion is to overload pg_get_wal_record_info one with single LSN
and another with start and end LSNs, I'm okay with that too.
Otherwise, I can have pg_get_wal_record_info with start and end LSN
(end LSN default to NULL) and return setof record.

> And the stats functions seem a bit out of place to me. If the SRF
> returned the data in the right format the user should be able to do
> aggregate queries to generate these stats easily enough. If anything a
> simple SQL function to do the aggregations could be provided.
>
> Now this is starting to get into the realm of bikeshedding but... Some
> of the code is taken straight from pg_waldump and does things like:
>
> + appendStringInfo(_blk_ref, "blkref #%u: rel %u/%u/%u fork %s blk %u",
>
> But that's kind of out of place for an SQL interface. It makes it hard
> to write queries since things like the relid, block number etc are in
> the string. If I wanted to use these functions I would expect to be
> doing something like "select all the decoded records pertaining to
> block n".

I will think more about this and change it in the next version of the
patch set, perhaps I will add more columns to the functions.

> All that said, I don't 

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?

2022-02-05 Thread Tomas Vondra

Hi,

there's been an interesting case [1] of a slow query on pgsql-general, 
related to the topic discussed in this thread. It causes an order the 
query to run slower by multiple orders of magnitude, and I think it's 
interesting, so let me present a simple example demonstrating it.



create table t1 (a int);
create table t2 (a int);

insert into t1
select i from generate_series(1,10) s(i);

insert into t2
select mod(i,10) from generate_series(1,1000) s(i);

create index on t1(a);
create index on t2(a);

vacuum analyze t1, t2;

-- we need to force mergejoin
set enable_nestloop = off;


Now, let's run a simple query:

SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
 WHERE (t1.a > 99000) ORDER BY t1.a LIMIT 100;

  QUERY PLAN

 Limit  (cost=4.63..224.57 rows=100 width=8)
(actual time=8999.487..8999.707 rows=100 loops=1)
   ->  Merge Join  (cost=4.63..209447.97 rows=95226 width=8)
   (actual time=8999.485..8999.620 rows=100 loops=1)
 Merge Cond: (t1.a = t2.a)
 ->  Index Only Scan using t1_a_idx on t1
 (cost=0.29..29.25 rows=969 width=4)
 (actual time=0.010..0.011 rows=1 loops=1)
   Index Cond: (a > 99000)
   Heap Fetches: 0
 ->  Index Only Scan using t2_a_idx on t2
 (cost=0.43..183464.09 rows=977 width=4)
 (actual time=0.026..4594.757 rows=9900200 loops=1)
   Heap Fetches: 0
 Planning Time: 0.338 ms
 Execution Time: 8999.768 ms
(10 rows)


Now, let's do a simple trick and add condition on t2.a, "implied" by the 
join condition (t1.a = t2.a) and inequality (t1.a > 99000).



SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
 WHERE (t1.a > 99000) AND (t2.a > 99000) ORDER BY t1.a LIMIT 100;

  QUERY PLAN

 Limit  (cost=0.77..250.39 rows=100 width=8)
(actual time=0.040..0.294 rows=100 loops=1)
   ->  Merge Join  (cost=0.77..2297.23 rows=920 width=8)
   (actual time=0.039..0.172 rows=100 loops=1)
 Merge Cond: (t1.a = t2.a)
 ->  Index Only Scan using t1_a_idx on t1
 (cost=0.29..29.25 rows=969 width=4)
 (actual time=0.031..0.031 rows=1 loops=1)
   Index Cond: (a > 99000)
   Heap Fetches: 0
 ->  Index Only Scan using t2_a_idx on t2
 (cost=0.43..2014.87 rows=96596 width=4)
 (actual time=0.005..0.052 rows=100 loops=1)
   Index Cond: (a > 99000)
   Heap Fetches: 0
 Planning Time: 0.222 ms
 Execution Time: 0.414 ms
(11 rows)

Well, that's quite a difference! From 9000ms to 1ms, pretty good.

What is happening in the first plan is the merge join needs t2 sorted by 
t2.a, and the index-only-scan looks like a great way to do that, as it 
has low startup cost (because LIMIT likes that). But this completely 
misses that (t1.a > 9900) implies (t2.a > 9900) through the equality in 
join condition. So we start scanning t2_a_idx, only to throw the first 
99% of tuples away.


In the original report this is particularly egregious, because the index 
only scan looks like this:


  -> Index Only Scan using data_class_pkey on data_class ta
 (cost=0.57..4935483.78 rows=216964862 width=8)
 (actual time=0.018..35022.908 rows=151321889 loops=1)
   Heap Fetches: 151321889

So yeah, 151M heap fetches, that's bound to be expensive :-/

Adding the condition on t2.a allows just skipping the first chunk of the 
index, eliminating the expensive part.


Of course, this breaks the estimates in the faster query, because we now 
apply the condition twice - once for the index scan, one as the join 
clause. So instead of ~100k rows the join is estimated as ~1000 rows.


I'm also not claiming this is 100% worth it - queries with a suitable 
combination of clauses (conditions on the join keys) seems rather 
uncommon. But it seems like an interesting example, because it may be 
seen either as missed execution optimization (failing to skip the 
initial chunk of rows), or an costing issue due to not accounting for 
having to process the rows (which would likely result in picking a 
different plan).




regards

[1] 
https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [BUG]Update Toast data failure in logical replication

2022-02-05 Thread Alvaro Herrera
On 2022-Feb-05, Amit Kapila wrote:

> On Fri, Feb 4, 2022 at 9:06 PM Alvaro Herrera  wrote:
> >
> > I don't have a reason not to commit this patch.
> >
> 
> It is not very clear to me from this so just checking again, are you
> fine with back-patching this as well?

Hmm, of course, I never thought it'd be a good idea to let the bug
unfixed in back branches.

Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)




Re: row filtering for logical replication

2022-02-05 Thread Amit Kapila
On Fri, Feb 4, 2022 at 2:58 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, February 3, 2022 11:11 PM houzj.f...@fujitsu.com 
> 
>
> Since the v76--clean-up-pgoutput-cache-invalidation.patch has been
> committed, attach a new version patch set to make the cfbot happy. Also
> addressed the above comments related to tab-complete in 0002 patch.
>

I don't like some of the error message changes in this new version. For example:

v75:
+CREATE FUNCTION testpub_rf_func1(integer, integer) RETURNS boolean AS
$$ SELECT hashint4($1) > $2 $$ LANGUAGE SQL;
+CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func1, LEFTARG = integer,
RIGHTARG = integer);
+CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27);
+ERROR:  invalid publication WHERE expression for relation "testpub_rf_tbl3"
+DETAIL:  User-defined operators are not allowed.

v77
+CREATE FUNCTION testpub_rf_func1(integer, integer) RETURNS boolean AS
$$ SELECT hashint4($1) > $2 $$ LANGUAGE SQL;
+CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func1, LEFTARG = integer,
RIGHTARG = integer);
+CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27);
+ERROR:  invalid publication WHERE expression
+LINE 1: ...ICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27);
+ ^
+DETAIL:  User-defined or mutable functions are not allowed

I think the detailed message by v75 "DETAIL:  User-defined operators
are not allowed." will be easier for users to understand. I have made
some code changes and refactoring to make this behavior like previous
without removing the additional checks you have added in v77. I have
made a few changes to comments and error messages. Attached is a
top-up patch on your v77 patch series. I suggest we can combine the
0001 and 0002 patches as well.

-- 
With Regards,
Amit Kapila.


v77_diff_amit.1.patch
Description: Binary data


Re: Latest LLVM breaks our code again

2022-02-05 Thread Thomas Munro
On Fri, Feb 4, 2022 at 8:12 AM Andres Freund  wrote:
> On 2022-02-03 10:44:11 +0100, Fabien COELHO wrote:
> > For these reasons, I'm inclined to let seawasp as it is.

It might be easier to use the nightly packages at
https://apt.llvm.org/.  You could update daily and still save so much
CPU that ...

> I find seawasp tracking the development trunk compilers useful. Just not for
> --with-llvm. The latter imo *reduces* seawasp's, because once there's an API
> change we can't see whether there's e.g. a compiler codegen issue leading to
> crashes or whatnot.
>
> What I was proposing was to remove --with-llvm from seawasp, and have a
> separate animal tracking the newest llvm release branch (I can run/host that
> if needed).

... you could do a couple of variations like that ^ on the same budget :-)

FWIW 14 just branched.  Vive LLVM 15.




Re: Release notes for February minor releases

2022-02-05 Thread Michael Banck
On Fri, Feb 04, 2022 at 04:35:07PM -0500, Tom Lane wrote:
> Michael Banck  writes:
> > On Fri, Feb 04, 2022 at 02:58:59PM -0500, Tom Lane wrote:
> >> +  If this seems to have affected a table, REINDEX
> >> +  should repair the damage.
> 
> > I don't think this is very helpful to the reader, are their indexes
> > corrupt or not? If we can't tell them a specific command to run to
> > check, can we at least mention that running amcheck would detect that
> > (if it actually does)? Otherwise, I guess the only way to be sure is to
> > just reindex every index? Or is this at least specific to b-trees?
> 
> Yeah, I wasn't too happy with that advice either, but it seems like the
> best we can do [1].  I don't think we should advise blindly reindexing
> your whole installation, because it's a very low-probability bug.

Right ok. I wonder whether it makes sense to at hint at the low
probability then; I guess if you know Postgres well you can deduct from
the "when the last transaction that could see the tuple ends while the
page is being pruned" that it is a low-probability corner-case, but
I fear lots of users will be unable to gauge the chances they got hit by
this bug and just blindly assume they are affected (and/or ask around).

I just woke up, so I don't have any good wording suggetsions yet.


Michael

-- 
Michael Banck
Teamleiter PostgreSQL-Team
Projektleiter
Tel.: +49 2166 9901-171
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz