date:20221123

Re: fixing CREATEROLE

2022-11-23 Thread Tom Lane

Robert Haas  writes:
> On Wed, Nov 23, 2022 at 12:36 PM Mark Dilger
>  wrote:
>> Yes, this all makes sense, but does it entail that the CREATE ROLE command 
>> must behave differently on the basis of a setting?

> Well, we certainly don't HAVE to add those new role-level properties;
> that's why they are in a separate patch. But I think they add a lot of
> useful functionality for a pretty minimal amount of extra code
> complexity.

I haven't thought about these issues hard enough to form an overall
opinion (though I agree that making CREATEROLE less tantamount
to superuser would be an improvement).  However, I share Mark's
discomfort about making these commands act differently depending on
context.  We learned long ago that allowing GUCs to affect query
semantics was a bad idea.  Basing query semantics on properties
of the issuing role (beyond success-or-permissions-failure) doesn't
seem a whole lot different from that.  It still means that
applications can't just issue command X and expect Y to happen;
they have to inquire about context in order to find out that Z might
happen instead.  That's bad in any case, but it seems especially bad
for security-critical behaviors.

regards, tom lane

Questions regarding distinct operation implementation

2022-11-23 Thread Ankit Kumar Pandey


Hello,


I have questions regarding distinct operation and would be glad if 
someone could help me out.


Consider the following table (mytable):

id, name

1, A

1, A

2, B

3, A

1, A

If we do /select avg(id) over (partition by name) from mytable/, 
partition logic goes like this:


for A: 1, 1, 3, 1

If we want to implement something like this /select avg(distinct id) 
over (partition by name) from m/ytable


and remove duplicate by storing last datum of aggregate column (id) and 
comparing it with current value. It fails here because aggregate column 
is not sorted within the partition.


Questions:

1. Is sorting prerequisite for finding distinct values?

2. Is it okay to sort aggregate column (within partition) for distinct 
to work in case of window function?


3. Is an alternative way exists to handle this scenario (because here 
sort is of no use in aggregation)?



Thanks


--
Regards,
Ankit Kumar Pandey

Re: Question concerning backport of CVE-2022-2625

2022-11-23 Thread Roberto C . Sánchez

Hi Tom,

On Sun, Nov 20, 2022 at 11:43:41AM -0500, Tom Lane wrote:
> 
> It'd likely be a good idea to reproduce this with a gdb breakpoint
> set at errfinish, and see exactly what's leading up to the error.
> 
So, I did as you suggested.  The top few frames of the backtrace were:

#0  errfinish (dummy=0)
at /build/postgresql-9.4-9.4.26/build/../src/backend/utils/error/elog.c:419
#1  0x5563cc733f25 in recordDependencyOnCurrentExtension (
object=object@entry=0x7ffcfc649310, isReplace=isReplace@entry=1 '\001')
at /build/postgresql-9.4-9.4.26/build/../src/backend/catalog/pg_depend.c:184
#2  0x5563cc735b72 in makeOperatorDependencies (tuple=0x5563cd10aaa8)
at 
/build/postgresql-9.4-9.4.26/build/../src/backend/catalog/pg_operator.c:862

The code at pg_depend.c:184 came directly from the CVE-2022-2625 commit,
5919bb5a5989cda232ac3d1f8b9d90f337be2077.  However, when I looked at
pg_operator.c:862 I saw that I had had to omit the following change in
backporting to 9.4:

/* Dependency on extension */
-   recordDependencyOnCurrentExtension(, true);
+   recordDependencyOnCurrentExtension(, isUpdate);

The reason is that the function makeOperatorDependencies() does not have
the parameter isUpdate in 9.4.  I found that the parameter was
introduced in 0dab5ef39b3d9d86e452f6ea60b4f5517d9a, which fixed a
problem with the ALTER OPERATOR command, but which also seems to bring
some structural changes as well and it wasn't clear they would be
particularly beneficial in resolving the issue.

In the end, what I settled on was a minor change to pg_operator.c to add
the isUpdate parameter to the signature of makeOperatorDependencies(),
along with updates to the invocations of makeOperatorDependencies() so
that when it is invoked in OperatorCreate() the parameter is passed in.
After that I was able to include the change I had originally omitted and
all the tests passed as written (with appropriate adjustments for
commands that did not support CINE in 9.4).

Thanks again for the suggestion of where to look for the failure!

Regards,

-Roberto

-- 
Roberto C. Sánchez

Re: wake up logical workers after ALTER SUBSCRIPTION

2022-11-23 Thread Nathan Bossart

On Tue, Nov 22, 2022 at 04:59:28PM +0530, Amit Kapila wrote:
> On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart  
> wrote:
>> While working on avoiding unnecessary wakeups in logical/worker.c (as was
>> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
>> much longer.  This seems to be caused by the reduced frequency of calls to
>> maybe_reread_subscription() in LogicalRepApplyLoop().
> 
> I think it would be interesting to know why tests started taking more
> time after a reduced frequency of calls to
> maybe_reread_subscription(). IIRC, we anyway call
> maybe_reread_subscription for each xact.

At the moment, commands like ALTER SUBSCRIPTION don't wake up the logical
workers for the target subscription, so the next call to
maybe_reread_subscription() may not happen for a while.  Presently, we'll
only sleep up to a second in the apply loop, but with my new
prevent-unnecessary-wakeups patch, we may sleep for much longer.  This
causes wait_for_subscription_sync to take more time after some ALTER
SUBSCRIPTION commands.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: fixing CREATEROLE

2022-11-23 Thread Tom Lane

Robert Haas  writes:
> But having said that, I could certainly change the patches so that any
> user, or maybe just a createrole user since it's otherwise irrelevant,
> can flip the INHERITCREATEDROLE and SETCREATEDROLE bits on their own
> account. There would be no harm in that from a security or auditing
> perspective AFAICS. It would, however, make the handling of those
> flags different from the handling of most other role-level properties.
> That is in fact why things ended up the way that they did: I just made
> the new role-level properties which I added work like most of the
> existing ones.

To be clear, I'm not saying that I know a better answer.  But the fact
that these end up so different from other role-property bits seems to
me to suggest that making them role-property bits is the wrong thing.
They aren't privileges in any usual sense of the word --- if they
were, allowing Alice to flip her own bits would obviously be silly.
But all the existing role-property bits, with the exception of
rolinherit, certainly are in the nature of privileges.

> What I would
> particularly like to hear in such an argument, though, is a theory
> that goes beyond those two particular properties and addresses what
> ought to be done with all the other ones, especially CREATEDB and
> CREATEROLE.

CREATEDB and CREATEROLE don't particularly bother me.  We've talked before
about replacing them with memberships in predefined roles, and that would
be fine.  But the reason nobody's got around to that (IMNSHO) is that it
won't really add much.  The thing that I think is a big wart is
rolinherit.  I don't know quite what to do about it.  But these two new
proposed bits seem to be much the same kind of wart, so I'd rather not
invent them, at least not in the form of role properties.

regards, tom lane

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 3:59 PM David G. Johnston
 wrote:
> I haven't yet formed a complete thought here but is there any reason we 
> cannot convert the permission-like attributes to predefined roles?
>
> pg_login
> pg_replication
> pg_bypassrls
> pg_createdb
> pg_createrole
> pg_haspassword (password and valid until)
> pg_hasconnlimit
>
> Presently, attributes are never inherited, but having that be controlled via 
> the INHERIT property of the grant seems desirable.

I think that something like this might be possible, but I'm not
convinced that it's a good idea. I've always felt that the role-level
properties seemed kind of like warts, but in studying these issues
recently, I've come to the conclusion that in some ways that's just a
visual impression. The syntax LOOKS outdated and clunky, whereas
granting someone a predefined role feels clean and modern. But the
reality is that the predefined roles system is full of really
unpleasant warts. For example, in talking through the now-committed
patch to allow control over SET ROLE, we had some fairly extensive
discussion of the fact that there was previously no way to avoid
having a user who has been granted the pg_read_all_stats predefined
role to create objects owned by pg_read_all_stats, or to alter
existing objects. That's really pretty grotty. We now have a way to
prevent that, but perhaps we should have something even better. I'm
also not really sure that's the only problem here, but maybe it is.

Either way, I'm not quite sure what the benefit of converting these
things to predefined roles is. I think actually the strongest argument
would be to do this for the superuser property! Make the bootstrap
superuser the only real superuser, and anyone else who wants to be a
superuser has to inherit that from that role. It's really unclear to
me why inheriting a lot of permissions is allowable, but inheriting
all of them is not allowable. Doing it for something like
pg_hasconnlimit seems pretty unappealing by contrast, because that's
an integer-valued property, not a Boolean, and it's not at all clear
to me why that should be inherited or what the semantics ought to be.
Really, I'm not that tempted to try to rejigger this kind of stuff
around because it seems like a lot of work for not a whole lot of
benefit. I think there's a perfectly reasonable case for setting some
things on a per-role basis that are actually per-role and not
inherited. A password is a fine example of that. You should never
inherit someone else's password. Whether we've chosen the right set of
things to treat as per-role properties rather than predefined roles is
very much debatable, though, as are a number of other aspects of the
role system.

For instance, I'm pretty well unconvinced that merging users and
groups into a uniformed thing called roles was a good idea. I think it
makes all of this machinery a LOT harder to understand, which may be
part of the reason why this area doesn't seem to have had much TLC in
quite a long time. But I think it's too late to revisit that decision,
and I also think it's too late to revisit the question of having
predefined roles at all. For better or for worse, that's what we did,
and what remains now is to find a way to make the best of it in light
of those decisions.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 4:18 PM Tom Lane  wrote:
> To be clear, I'm not saying that I know a better answer.  But the fact
> that these end up so different from other role-property bits seems to
> me to suggest that making them role-property bits is the wrong thing.
> They aren't privileges in any usual sense of the word --- if they
> were, allowing Alice to flip her own bits would obviously be silly.
> But all the existing role-property bits, with the exception of
> rolinherit, certainly are in the nature of privileges.

I think that's somewhat true, but I don't completely agree. I don't
think that INHERIT, LOGIN, CONNECTION LIMIT, PASSWORD, or VALID UNTIL
are privileges either. I think they're just properties. I would put
these in the same category: properties, not privileges. I think that
SUPERUSER, CREATEDB, CREATEROLE, REPLICATION, and BYPASSRLS are
privileges.

> CREATEDB and CREATEROLE don't particularly bother me.  We've talked before
> about replacing them with memberships in predefined roles, and that would
> be fine.  But the reason nobody's got around to that (IMNSHO) is that it
> won't really add much.

I agree, although I'm not sure that means that we don't need to do
anything about them as we evolve the system.

> The thing that I think is a big wart is
> rolinherit.  I don't know quite what to do about it.

One option is to nuke it from orbit. Now that you can set that
property on a per-grant basis, the per-role basis serves only to set
the default. I think that's of dubious value, and arguably backwards,
because ISTM that in a lot of cases whether you want a role grant to
be inherited will depend on the nature of the role being granted
rather than the role to which it is being granted. The setting we have
works the other way around, and I can never keep in my head what the
use case for that is. I think there must be one, though, because Peter
Eisentraut seemed to like having it around. I don't understand why,
but I respect Peter. :-)

> But these two new
> proposed bits seem to be much the same kind of wart, so I'd rather not
> invent them, at least not in the form of role properties.

I have to admit that when I realized that was the natural place to put
them to make the patch work, my first reaction internally was "well,
that can't possibly be right, role properties suck!". But I didn't and
still don't see where else to put them that makes any sense at all, so
I eventually decided that my initial reaction was misguided. So I
can't really blame you for not liking it either, and would be happy if
we could come up with something else that feels better. I just don't
know what it is: at least as of this moment in time, I believe these
naturally ARE properties of the role, and therefore I'm inclined to
view my initial reluctance to implement it that way as a reflex rather
than a well-considered opinion. That is, the CREATE ROLE syntax is
clunky, and it controls some things that are properties and others
that are permissions, but they're not inherited like regular
permissions, so it stinks, and thus adding things to it also feels
stinky. But if the existing command weren't such a mess I'm not sure
adding this stuff to it would feel bad either.

That might be the wrong view. As I say, I'm open to other ideas, and
it's possible there's some nicer way to do it that I just don't see
right now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: postgres_fdw binary protocol support

2022-11-23 Thread Greg Stark

On Tue, 22 Nov 2022 at 08:17, Ashutosh Bapat
 wrote:
>
> AFAIU, binary compatibility of two postgresql servers depends upon the
> binary compatibility of the platforms on which they run.

No, libpq binary mode is not architecture-specific. I think you're
thinking of on-disk binary compatibility. But libpq binary mode is
just a binary network representation of the data instead of an ascii
representation. It should be faster and more efficient but it still
goes through binary input/output functions (which aren't named
input/output)

I actually wonder if having this would be a good way to get some code
coverage of the binary input/output functions which I suspect is sadly
lacking now. It wouldn't necessarily test that they're doing what
they're supposed to... but at least they would be getting run which I
don't think they are currently?

-- 
greg

Re: Document parameter count limit

2022-11-23 Thread David G. Johnston

On Wed, Nov 23, 2022 at 11:47 AM Tom Lane  wrote:

> Bruce Momjian  writes:
> > Does this come up enough to document it?  I assume the error message the
> > user receives is clear.
>
> Looks like you get
>
> if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
> {
> libpq_append_conn_error(conn, "number of parameters must be
> between 0 and %d",
>PQ_QUERY_PARAM_MAX_LIMIT);
> return 0;
> }
>
> which seems clear enough.
>
> I think the concern here is that somebody who's not aware that a limit
> exists might write an application that thinks it can send lots of
> parameters, and then have it fall over in production.  Now, I've got
> doubts that an entry in the limits.sgml table will do much to prevent
> that scenario.  But perhaps offering the advice to use an array parameter
> will be worthwhile even after-the-fact.
>

It comes up enough in places I troll that having a link to drop into a
reply would be nice.
I do believe that people who want to use a large parameter list likely have
that question in the back of their mind, and looking at a page called
"System Limits" is at least plausibly something they would do.  Since they
are really caring about parse-bind-execute, and they aren't likely to dig
into libpq, this seems like the best spot (as opposed to, say PREPARE)

David J.

Re: drop postmaster symlink

2022-11-23 Thread Andres Freund

Hi,

On 2022-11-23 10:07:49 -0500, Tom Lane wrote:
> Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> > ...and it helps us to find the "main" process a bit easily.
> 
> Hmm, that's a nontrivial point perhaps.  It's certain that this
> will break some other people's start scripts too.

OTOH, postmaster has been deprecated for ~15 years.

> On the whole, is it really that hard to add the symlink to the meson build?

No. Meson has a builtin command for it, just not in the meson version we're
currently requiring. We can create the symlink ourselves instead. The problem
is just detecting systems where we can't symlink and what to fall back to
there.

Greetings,

Andres Freund

Re: predefined role(s) for VACUUM and ANALYZE

2022-11-23 Thread Andrew Dunstan

On 2022-11-20 Su 11:57, Nathan Bossart wrote:
> On Sat, Nov 19, 2022 at 10:50:04AM -0800, Nathan Bossart wrote:
>> another rebase
> Another rebase for cfbot.
>

I have committed the first couple of these to get them out of the way.

But I think we need a bit of cleanup in the next patch.
vacuum_is_relation_owner() looks like it's now rather misnamed. Maybe
vacuum_is_permitted_for_relation()? Also I think we need a more thorough
reworking of the comments around line 566. And I think we need a more
detailed explanation of why the change in vacuum_rel is ok, and if it is
OK we should adjust the head comment on the function.

In any case I think this comment would be better English with "might"
instead of "may":

/* user may have the ANALYZE privilege */

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 3:11 PM Mark Dilger
 wrote:
> Ok, so the critical part of this proposal is that auditing tools can tell 
> when Alice circumvents these settings.  Without that bit, the whole thing is 
> inane.  Why make Alice jump through hoops that you are explicitly allowing 
> her to jump through?  Apparently the answer is that you can point a high 
> speed camera at the hoops.

Well put.

Also, it's a bit like 'su', right? Typically you don't just log in as
root and do everything a root, even if you have access to root
privileges. You log in as 'mdilger' or whatever and then when you want
to exercise elevated privileges you use 'su' or  'sudo' or something.
Similarly here you can make an argument that it's a lot cleaner to
give Alice the potential to access all of these privileges than to
make her have them all the time.

But on the flip side, one big advantage of having 'alice' have the
privileges all the time is that, for example, she can probably restore
a database dump that might otherwise be restorable only with superuser
privileges. As long as she has been granted all the relevant roles
with INHERIT TRUE, SET TRUE, the kinds of locutions that pg_dump spits
out should pretty much work fine, whereas if Alice is firewalled from
the privileges of the roles she manages, that is not going to work
well at all. To me, that is a pretty huge advantage, and it's a major
reason why I initially thought that alice should just categorically,
always inherit the privileges of the roles she creates.

But having been burned^Wenlightened by previous community discussion,
I can now see both sides of the argument, which is why I am now
proposing to let people pick the behavior they happen to want.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: More efficient build farm animal wakeup?

2022-11-23 Thread Thomas Munro

On Thu, Nov 24, 2022 at 10:00 AM Magnus Hagander  wrote:
> On Wed, Nov 23, 2022 at 9:15 AM Thomas Munro  wrote:
> Are you saying you still think it's worth pursuing longpoll or similar 
> methods for it, or that this is good enough?

I personally think it'd be pretty neat, to squeeze out that last bit
of latency.  Maybe it's overkill...

The best idea I have so far for how to make it work more like a
service but still require nothing more than cron (so it's not hard for
people on systems where they don't even have root) is to have it start
if not already running (current lock file scheme already does that)
AND if some file buildfarm_enabled exists, or buildfarm_disabled
doesn't exist or something like that, and then keep running while
that's true.  So if you need to turn it off for a while you can just
touch/rm that, but normally it'll keep running its wait loop forever,
and start up soon after a reboot; maybe it also exits if you touch the
config file so it can restart next time and reread it, or something
like that.  Then it can spend all day in a loop that does 120s long
polls, and start builds within seconds of a new commit landing.

Curious to know how you'd build the server side.  You mentioned a
commit hook notifying some kind of long poll distributor.  Would you
use a Twisted/async/whatever-based server that knows how to handle
lots of sockets efficiently, or just use old school web server tech
that would block waiting for NOTIFY or something like that?  You'd
probably get away with that for the small numbers of animals involved
(I mean, a couple of hundred web server threads/processes just sitting
there waiting would be borderline acceptable I guess).  But it'd be
more fun to do it with async magic.

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Nov 23, 2022 at 11:03 PM Thomas Munro  wrote:
>> I assume this is ext4.  Presumably anything that reads the
>> controlfile, like pg_ctl, pg_checksums, pg_resetwal,
>> pg_control_system(), ... by reading without interlocking against
>> writes could see garbage.  I have lost track of the versions and the
>> thread, but I worked out at some point by experimentation that this
>> only started relatively recently for concurrent read() and write(),
>> but always happened with concurrent pread() and pwrite().  The control
>> file uses the non-p variants which didn't mash old/new data like
>> grated cheese under concurrency due to some implementation detail, but
>> now does.

Ugh.

> As for what to do about it, some ideas:
> 2.  Retry after a short time on checksum failure.  The probability is
> already miniscule, and becomes pretty close to 0 if we read thrice
> 100ms apart.

> First thought is that 2 is appropriate level of complexity for this
> rare and stupid problem.

Yeah, I was thinking the same.  A variant could be "repeat until
we see the same calculated checksum twice".

regards, tom lane

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 12:36 PM Mark Dilger
 wrote:
> Oh, I don't mean that it will be poorly documented.  I mean that the way the 
> command is written won't advertise what it is going to do.  That's concerning 
> if you fat-finger a CREATE ROLE command, then realize you need to drop and 
> recreate the role, only to discover that a property you weren't thinking 
> about, and which you are accustomed to being set the opposite way, is set 
> such that you can't drop the role you just created.

That doesn't ever happen. No matter how the properties are set, you
end up with ADMIN OPTION on the newly-created role and can drop it.
The flags control things like whether you can select from the newly
created role's tables even if you otherwise lack permissions on them
(INHERIT), and whether you can SET ROLE to it (SET). You can always
administer it, i.e. grant rights on it to others, change its password,
drop it.

> I think if you're going to create-and-disown something, you should have to 
> say so, to make sure you mean it.

Reasonable, but not relevant, since that isn't what's happening.

> Why not make this be a permissions issue, rather than a default behavior 
> issue?  Instead of a single CREATEROLE privilege, grant roles privileges to 
> CREATE-WITH-INHERIT, CREATE-WITH-ADMIN, CREATE-SANS-INHERIT, 
> CREATE-SANS-ADMIN, and thereby limit which forms of the command they may 
> execute.  That way, the semantics of the command do not depend on some 
> property external to the command.  Users (and older scripts) will expect the 
> traditional syntax to behave consistent with how CREATE  ROLE has worked in 
> the past.  The behaviors can be specified explicitly.

Perhaps if we get the confusion above cleared up you won't be as
concerned about this, but let me just say that this patch is
absolutely breaking backward compatibility. I don't feel bad about
that, either. I think it's a good thing in this case, because the
current behavior is abjectly broken and horrible. What we've been
doing for the last several years is shipping a product that has a
built-in exploit that a clever 10-year-old could use to escalate
privileges from CREATEROLE to SUPERUSER. We should not be OK with
that, and we should be OK with changing the behavior however much is
required to fix it. I'm personally of the opinion that this patch set
does a rather clever job minimizing that blast radius, but that might
be my own bias as the patch author. Regardless, I don't think there's
any reasonable argument for maintaining the current behavior. I don't
entirely follow exactly what you have in mind in the sentence above,
but if it involves keeping the current CREATEROLE behavior around in
any form, -1 from me.

> > But since this implicit grant has, and must have, the bootstrap
> > superuser as grantor, it is also only reasonable that superusers get
> > to determine what options are used when creating that grant, rather
> > than leaving that up to the CREATEROLE user.
>
> Yes, this all makes sense, but does it entail that the CREATE ROLE command 
> must behave differently on the basis of a setting?

Well, we certainly don't HAVE to add those new role-level properties;
that's why they are in a separate patch. But I think they add a lot of
useful functionality for a pretty minimal amount of extra code
complexity.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: code cleanups

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 12:53 PM Tom Lane  wrote:
> at least for bool arrays, that's true of memset'ing as well.  But this,
> if you decide you need something other than zeroes, is a foot-gun.
> In particular, someone whose C is a bit weak might mistakenly think that
>
> boolnulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS] = {true};
>
> will set all the array elements to true.  Nor is there a plausible
> argument that this is more efficient.  So I don't care for this approach
> and I don't want to adopt it.

I don't really know what the argument is for the explicit initializer
style, but I think this argument against it is pretty weak.

It should be more than fine to assume that anyone who is hacking on
PostgreSQL is proficient in C. It's true that there might be some
people who aren't, or who aren't familiar with the limitations of the
initializer construct, and I include myself in that latter category. I
don't think it was part of C when I learned C. But if we don't possess
the collective expertise as a project to bring people who have missed
these details of the C programming language up to speed, we should
just throw in the towel now and go home.

Hacking on PostgreSQL is HARD and it relies on knowing FAR more than
just the basics of how to code in C. Put differently, if you can't
even figure out how C works, you have no chance of doing anything very
interesting with the PostgreSQL code base, because you're going to
have to figure out a lot more than the basics of the implementation
language to make a meaningful contribution.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Document parameter count limit

2022-11-23 Thread Bruce Momjian

On Thu, Nov 10, 2022 at 11:01:18AM -0700, David G. Johnston wrote:
> On Thu, Nov 10, 2022 at 10:58 AM Corey Huinker  
> wrote:
> 
> 
> +    if you are reading this prepatorily, please redesign your
> query to use temporary tables or arrays
> 
> 
> I agree with the documentation of this parameter.
> I agree with dissuading anyone from attempting to change it
> The wording is bordering on snark (however well deserved) and I think the
> voice is slightly off.
> 
> Alternate suggestion:
> 
> 
> Queries approaching this limit usually can be refactored to use arrays
> or temporary tables, thus reducing parameter overhead.
> 
> 
> The bit about parameter overhead appeals to the reader's desire for
> performance, rather than just focusing on "you shouldn't want this".
> 
> 
> Yeah, the wording is a bit tongue-in-cheek.  Figured assuming a committer 
> wants
> this at all we'd come up with better wording.  I like your suggestion.

Does this come up enough to document it?  I assume the error message the
user receives is clear.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Another multi-row VALUES bug

2022-11-23 Thread Tom Lane

Dean Rasheed  writes:
> On Wed, 23 Nov 2022 at 15:30, Tom Lane  wrote:
>> Hmm ... this patch does not feel any more principled or future-proof
>> than what it replaces, because now instead of making assumptions
>> about what's in the jointree, you're making assumptions about what's
>> in the targetlist.

> True, but it's consistent with what rewriteValuesRTE() does -- it has
> to examine the targetlist to work out how items in the VALUES lists
> are mapped to attributes of the target relation.

That argument seems a little circular, because rewriteValuesRTE
is taking it on faith that it's told the correct RTE to modify.

>> I'm not 100% sure that product-query rewriting would always produce
>> a FROM-list in this order, but I think it might be true.

> No, the test case using rule r3 is a counter-example. In that case,
> the product query has 2 VALUES RTEs, both of which appear in the
> fromlist, and it's the second one that needs rewriting when it
> recurses into the product query.

Ah, right.  I wonder if somehow we could just make one pass over
all the VALUES RTEs, and process each one as needed?  The problem
is to identify the relevant target relation, I guess.

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-11-23 Thread Jacob Champion

On 11/23/22 01:58, mahendrakar s wrote:
> We validated on  libpq handling OAuth natively with different flows
> with different OIDC certified providers.
> 
> Flows: Device Code, Client Credentials and Refresh Token.
> Providers: Microsoft, Google and Okta.

Great, thank you!

> Also validated with OAuth provider Github.

(How did you get discovery working? I tried this and had to give up
eventually.)

> We propose using OpenID Connect (OIDC) as the protocol, instead of
> OAuth, as it is:
> - Discovery mechanism to bridge the differences and provide metadata.
> - Stricter protocol and certification process to reliably identify
> which providers can be supported.
> - OIDC is designed for authentication, while the main purpose of OAUTH is to
> authorize applications on behalf of the user.

How does this differ from the previous proposal? The OAUTHBEARER SASL
mechanism already relies on OIDC for discovery. (I think that decision
is confusing from an architectural and naming standpoint, but I don't
think they really had an alternative...)

> Github is not OIDC certified, so won’t be supported with this proposal.
> However, it may be supported in the future through the ability for the
> extension to provide custom discovery document content.

Right.

> OpenID configuration has a well-known discovery mechanism
> for the provider configuration URI which is
> defined in OpenID Connect. It allows libpq to fetch
> metadata about provider (i.e endpoints, supported grants, response types, 
> etc).

Sure, but this is already how the original PoC works. The test suite
implements an OIDC provider, for instance. Is there something different
to this that I'm missing?

> In the attached patch (based on V2 patch in the thread and does not
> contain Samay's changes):
> - Provider can configure issuer url and scope through the options hook.)
> - Server passes on an open discovery url and scope to libpq.
> - Libpq handles OAuth flow based on the flow_type sent in the
> connection string [1].
> - Added callbacks to notify a structure to client tools if OAuth flow
> requires user interaction.
> - Pg backend uses hooks to validate bearer token.

Thank you for the sample!

> Note that authentication code flow with PKCE for GUI clients is not
> implemented yet.
> 
> Proposed next steps:
> - Broaden discussion to reach agreement on the approach.

High-level thoughts on this particular patch (I assume you're not
looking for low-level implementation comments yet):

0) The original hook proposal upthread, I thought, was about allowing
libpq's flow implementation to be switched out by the application. I
don't see that approach taken here. It's fine if that turned out to be a
bad idea, of course, but this patch doesn't seem to match what we were
talking about.

1) I'm really concerned about the sudden explosion of flows. We went
from one flow (Device Authorization) to six. It's going to be hard
enough to validate that *one* flow is useful and can be securely
deployed by end users; I don't think we're going to be able to maintain
six, especially in combination with my statement that iddawc is not an
appropriate dependency for us.

I'd much rather give applications the ability to use their own OAuth
code, and then maintain within libpq only the flows that are broadly
useful. This ties back to (0) above.

2) Breaking the refresh token into its own pseudoflow is, I think,
passing the buck onto the user for something that's incredibly security
sensitive. The refresh token is powerful; I don't really want it to be
printed anywhere, let alone copy-pasted by the user. Imagine the
phishing opportunities.

If we want to support refresh tokens, I believe we should be developing
a plan to cache and secure them within the client. They should be used
as an accelerator for other flows, not as their own flow.

3) I don't like the departure from the OAUTHBEARER mechanism that's
presented here. For one, since I can't see a sample plugin that makes
use of the "flow type" magic numbers that have been added, I don't
really understand why the extension to the mechanism is necessary.

For two, if we think OAUTHBEARER is insufficient, the people who wrote
it would probably like to hear about it. Claiming support for a spec,
and then implementing an extension without review from the people who
wrote the spec, is not something I'm personally interested in doing.

4) The test suite is still broken, so it's difficult to see these things
in practice for review purposes.

> - Implement libpq changes without iddawc

This in particular will be much easier with a functioning test suite,
and with a smaller number of flows.

> - Prototype GUI flow with pgAdmin

Cool!

Thanks,
--Jacob

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 2:28 PM Mark Dilger
 wrote:
> > On Nov 23, 2022, at 11:02 AM, Robert Haas  wrote:
> > For me, this
> > clearly falls into the "good" category: it's configuration that you
> > put into the database that makes things happen the way you want, not a
> > behavior-changing setting that comes along and ruins somebody's day.
>
> I had incorrectly imagined that if the bootstrap superuser granted CREATEROLE 
> to Alice with particular settings, those settings would limit the things that 
> Alice could do when creating role Bob, specifically limiting how much she 
> could administer/inherit/set role Bob thereafter.  Apparently, your proposal 
> only configures what happens by default, and Alice can work around that if 
> she wants to.

Right.

> But if that's the case, did I misunderstand upthread that these are 
> properties the superuser specifies about Alice?  Can Alice just set these 
> properties about herself, so she gets the behavior she wants?  I'm confused 
> now about who controls these settings.

Because they are role-level properties, they can be set by whoever has
ADMIN OPTION on the role. That always includes every superuser, and it
never includes Alice herself (except if she's a superuser). It could
include other users depending on the system configuration. For
example, under this proposal, the superuser might create alice and set
her account to CREATEROLE, configuring the INHERITCREATEDROLES and
SETCREATEDROLES properties on Alice's account according to preference.
Then, alice might create another user, say bob, and make him
CREATEROLE as well. In such a case, either the superuser or alice
could set these properties for role bob, because alice enjoys ADMIN
OPTION on role bob.

Somewhat to one side of the question you were asking, but related to
the above, I believe there is an opportunity, and perhaps a need, to
modify the scope of CREATEROLE in terms of what role-level options a
CREATEROLE user can set. For instance, if a CREATEROLE user doesn't
have CREATEDB, they can still create users and give them that
privilege, even with these patches, and likewise these two new
properties. This patch is only concerned about which roles you can
manipulate, not what role-level properties you can set. Somebody might
feel that's a serious problem, and they might even feel that this
patch set ought to something about it. In my view, the issues are
somewhat severable. I don't think you can do anything as evil by
setting role-level properties (except for SUPERUSER, of course) as
what you can do by granting predefined roles, so I don't find
restricting those capabilities to be as urgent as doing something to
restrict role grants.

Also, and this definitely plays into it too, I think there's some
debate about what the model ought to look like there. For instance,
you could simply stipulate that you can't give what you don't have,
but that would mean that every  CREATEROLE user can create additional
CREATEROLE users, and I suspect some people might like to restrict
that. We could add a new CREATECREATEROLE property to decide whether a
user can make CREATEROLE users, but by that argument we'd also need a
new CREATECREATECREATEROLE property to decide whether a role can make
CREATECREATEROLE users, and then it just recurses indefinitely from
there. Similarly for CREATEDB. Also, what if anything should you
restrict about how the new INHERITCREATEDROLES and SETCREATEDROLES
properties should be set? You could argue that they ought to be
superuser-only (because the implicit grant is performed by the
bootstrap superuser) or that it's fine for them to be set by a
CREATEROLE user with ADMIN OPTION (because it's not all that
security-critical how they get set) or maybe even that a user ought to
be able to set those properties on his or her own role.

I'm not very certain about any of that stuff; I don't have a clear
mental model of how it should work, or even what exact problem we're
trying to solve. To me, the patches that I posted make sense as far as
they go, but I'm not under the illusion that they solve all the
problems in this area, or even that I understand what all of the
problems are.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-23 Thread Mark Dilger




> On Nov 23, 2022, at 12:04 PM, Robert Haas  wrote:
> 
>> But if that's the case, did I misunderstand upthread that these are 
>> properties the superuser specifies about Alice?  Can Alice just set these 
>> properties about herself, so she gets the behavior she wants?  I'm confused 
>> now about who controls these settings.
> 
> Because they are role-level properties, they can be set by whoever has
> ADMIN OPTION on the role. That always includes every superuser, and it
> never includes Alice herself (except if she's a superuser). It could
> include other users depending on the system configuration. For
> example, under this proposal, the superuser might create alice and set
> her account to CREATEROLE, configuring the INHERITCREATEDROLES and
> SETCREATEDROLES properties on Alice's account according to preference.
> Then, alice might create another user, say bob, and make him
> CREATEROLE as well. In such a case, either the superuser or alice
> could set these properties for role bob, because alice enjoys ADMIN
> OPTION on role bob.

Ok, so the critical part of this proposal is that auditing tools can tell when 
Alice circumvents these settings.  Without that bit, the whole thing is inane.  
Why make Alice jump through hoops that you are explicitly allowing her to jump 
through?  Apparently the answer is that you can point a high speed camera at 
the hoops.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing CREATEROLE

2022-11-23 Thread Tom Lane

Robert Haas  writes:
> On Wed, Nov 23, 2022 at 2:28 PM Mark Dilger
>  wrote:
>> I had incorrectly imagined that if the bootstrap superuser granted
>> CREATEROLE to Alice with particular settings, those settings would
>> limit the things that Alice could do when creating role Bob,
>> specifically limiting how much she could administer/inherit/set role
>> Bob thereafter.  Apparently, your proposal only configures what happens
>> by default, and Alice can work around that if she wants to.

> Right.

Okay ...

>> But if that's the case, did I misunderstand upthread that these are
>> properties the superuser specifies about Alice?  Can Alice just set
>> these properties about herself, so she gets the behavior she wants?
>> I'm confused now about who controls these settings.

> Because they are role-level properties, they can be set by whoever has
> ADMIN OPTION on the role. That always includes every superuser, and it
> never includes Alice herself (except if she's a superuser).

That is just bizarre.  Alice can do X, and she can do Y, but she
can't control a flag that says which of those happens by default?
How is that sane (disregarding the question of whether the existence
of the flag is a good idea, which I'm now even less sold on)?

regards, tom lane

Re: Document parameter count limit

2022-11-23 Thread Justin Pryzby

On Wed, Nov 23, 2022 at 12:35:59PM -0700, David G. Johnston wrote:
> On Wed, Nov 23, 2022 at 11:47 AM Tom Lane  wrote:
> 
> > Bruce Momjian  writes:
> > > Does this come up enough to document it?  I assume the error message the
> > > user receives is clear.
> >
> > Looks like you get
> >
> > if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
> > {
> > libpq_append_conn_error(conn, "number of parameters must be between 
> > 0 and %d",
> >PQ_QUERY_PARAM_MAX_LIMIT);
> > return 0;
> > }
> >
> > which seems clear enough.
> >
> > I think the concern here is that somebody who's not aware that a limit
> > exists might write an application that thinks it can send lots of
> > parameters, and then have it fall over in production.  Now, I've got
> > doubts that an entry in the limits.sgml table will do much to prevent
> > that scenario.  But perhaps offering the advice to use an array parameter
> > will be worthwhile even after-the-fact.

Yes, that's what happens :)

I hit that error after increasing the number of VALUES(),() a loader
used in a prepared statement (and that was with our non-wide tables).

+1 to document the limit along with the other limits.

-- 
Justin

Re: More efficient build farm animal wakeup?

2022-11-23 Thread Magnus Hagander

On Wed, Nov 23, 2022 at 9:15 AM Thomas Munro  wrote:

> On Wed, Nov 23, 2022 at 2:09 PM Andres Freund  wrote:
> > It's a huge improvement here.
>
> Same here. eelpout + elver looking good, just a fraction of a second
> hitting that web server each minute.  Long polling will be better and
> shave off 30 seconds (+/- 30) on start time, but this avoids a lot of
> useless churn without even needing a local mirror.  Thanks Andrew!
>

Are you saying you still think it's worth pursuing longpoll or similar
methods for it, or that this is good enough?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: fixing CREATEROLE

2022-11-23 Thread David G. Johnston

On Wed, Nov 23, 2022 at 2:18 PM Robert Haas  wrote:

> On Wed, Nov 23, 2022 at 3:59 PM David G. Johnston
>  wrote:
> > I haven't yet formed a complete thought here but is there any reason we
> cannot convert the permission-like attributes to predefined roles?
> >
> > pg_login
> > pg_replication
> > pg_bypassrls
> > pg_createdb
> > pg_createrole
> > pg_haspassword (password and valid until)
> > pg_hasconnlimit
> >
> > Presently, attributes are never inherited, but having that be controlled
> via the INHERIT property of the grant seems desirable.
>
> I think that something like this might be possible, but I'm not
> convinced that it's a good idea.
>


> Either way, I'm not quite sure what the benefit of converting these
> things to predefined roles is.


Specifically, you gain inheritance/set and "admin option" for free.  So
whether I have an ability and whether I can grant it are separate concerns.



> A password is a fine example of that. You should never
> inherit someone else's password. Whether we've chosen the right set of
> things to treat as per-role properties rather than predefined roles is
> very much debatable, though, as are a number of other aspects of the
> role system.
>

You aren't inheriting a specific password, you are inheriting the right to
have a password stored in the database, with an optional expiration date.

>
> For instance, I'm pretty well unconvinced that merging users and
> groups into a uniformed thing called roles was a good idea.


I agree.  No one was interested in the, admittedly complex, psql queries I
wrote the other month but I decided to undo some of that decision there.

David J.

Re: Document parameter count limit

2022-11-23 Thread Tom Lane

Bruce Momjian  writes:
> Does this come up enough to document it?  I assume the error message the
> user receives is clear.

Looks like you get

if (nParams < 0 || nParams > PQ_QUERY_PARAM_MAX_LIMIT)
{
libpq_append_conn_error(conn, "number of parameters must be between 0 
and %d",
   PQ_QUERY_PARAM_MAX_LIMIT);
return 0;
}

which seems clear enough.

I think the concern here is that somebody who's not aware that a limit
exists might write an application that thinks it can send lots of
parameters, and then have it fall over in production.  Now, I've got
doubts that an entry in the limits.sgml table will do much to prevent
that scenario.  But perhaps offering the advice to use an array parameter
will be worthwhile even after-the-fact.

regards, tom lane

Re: Add sub-transaction overflow status in pg_stat_activity

2022-11-23 Thread Bruce Momjian

On Mon, Nov 14, 2022 at 10:09:57AM -0500, Robert Haas wrote:
> I think I fundamentally disagree with the idea that we should refuse
> to expose instrumentation data because some day the internals might
> change. If we accepted that argument categorically, we wouldn't have
> things like backend_xmin or backend_xid in pg_stat_activity, or wait
> events either, but we do have those things and users find them useful.
> They suck in the sense that you need to know quite a bit about how the
> internals work in order to use them to find problems, but people who
> want to support production PostgreSQL instances have to learn about
> how those internals work one way or the other because they
> demonstrably matter. It is absolutely stellar when we can say "hey, we

I originally thought having this value in pg_stat_activity was overkill,
but seeing the other internal/warning columns in that view, I think it
makes sense.  Oddly, is our 64 snapshot performance limit even
documented anywhere?  I know it is in Simon's patch I am working on.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: fixing CREATEROLE

2022-11-23 Thread Mark Dilger

> On Nov 23, 2022, at 11:02 AM, Robert Haas  wrote:
> 
> For me, this
> clearly falls into the "good" category: it's configuration that you
> put into the database that makes things happen the way you want, not a
> behavior-changing setting that comes along and ruins somebody's day.

I had incorrectly imagined that if the bootstrap superuser granted CREATEROLE 
to Alice with particular settings, those settings would limit the things that 
Alice could do when creating role Bob, specifically limiting how much she could 
administer/inherit/set role Bob thereafter.  Apparently, your proposal only 
configures what happens by default, and Alice can work around that if she wants 
to.  But if that's the case, did I misunderstand upthread that these are 
properties the superuser specifies about Alice?  Can Alice just set these 
properties about herself, so she gets the behavior she wants?  I'm confused now 
about who controls these settings.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Tests for psql \g and \o

2022-11-23 Thread Daniel Verite

Michael Paquier wrote:

> +psql_like($node, "SELECT 'one' \\g | cat >$g_file", qr//, "one command
> \\g");
> +my $c1 = slurp_file($g_file);
> +like($c1, qr/one/);
> 
> Windows may not have an equivalent for "cat", no?  Note that psql's
> 001_basic.pl has no restriction in place for Windows.  Perhaps you
> could use the same trick as basebackup_to_shell, where GZIP is used to
> write some arbitrary data..  Anyway, this has some quoting issues
> especially if the file's path has whitespaces?  This is located in
> File::Temp::tempdir, still it does not sound like a good thing to rely
> on this assumption on portability grounds.

PFA a new patch addressing these issues.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl
index f447845717..6fcbdeff7d 100644
--- a/src/bin/psql/t/001_basic.pl
+++ b/src/bin/psql/t/001_basic.pl
@@ -325,4 +325,24 @@ is($row_count, '10',
'client-side error commits transaction, no ON_ERROR_STOP and multiple 
-c switches'
 );
 
+# Test \g output piped into a program
+my $g_file = "$tempdir/g_file_1.out";
+# on Windows, findstr "^" with any text on stdin will copy it to stdout
+my $pipe_cmd = $windows_os ? "findstr \"^\" > \"$g_file\"" : "cat > 
\"$g_file\"";
+
+psql_like($node, "SELECT 'one' \\g | $pipe_cmd", qr//, "one command \\g");
+my $c1 = slurp_file($g_file);
+like($c1, qr/one/);
+
+psql_like($node, "SELECT 'two' \\; SELECT 'three' \\g | $pipe_cmd", qr//, "two 
commands \\g");
+my $c2 = slurp_file($g_file);
+like($c2, qr/two.*three/s);
+
+
+psql_like($node, "\\set SHOW_ALL_RESULTS 0\nSELECT 'four' \\; SELECT 'five' 
\\g | $pipe_cmd", qr//,
+  "two commands \\g with only last result");
+my $c3 = slurp_file($g_file);
+like($c3, qr/five/);
+unlike($c3, qr/four/);
+
 done_testing();
diff --git a/src/test/regress/expected/psql.out 
b/src/test/regress/expected/psql.out
index 5bdae290dc..29bc07c1b2 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -5433,6 +5433,136 @@ CONTEXT:  PL/pgSQL function warn(text) line 2 at RAISE
 \set SHOW_ALL_RESULTS on
 DROP FUNCTION warn(TEXT);
 --
+-- \g file
+--
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set outfile1 :abs_builddir '/results/psql-output1'
+-- this table is used to load back the output data from files
+CREATE TEMPORARY TABLE reload_output(
+ lineno int not null generated always as identity,
+ line text
+);
+SELECT 1 AS a \g :outfile1
+COPY reload_output(line) FROM :'outfile1';
+SELECT 2 AS b\; SELECT 3 AS c\; SELECT 4 AS d \g :outfile1
+COPY reload_output(line) FROM :'outfile1';
+COPY (select 'foo') to stdout \; COPY (select 'bar') to stdout \g :outfile1
+COPY reload_output(line) FROM :'outfile1';
+SELECT line FROM reload_output ORDER BY lineno;
+  line   
+-
+  a 
+ ---
+  1
+ (1 row)
+ 
+  b 
+ ---
+  2
+ (1 row)
+ 
+  c 
+ ---
+  3
+ (1 row)
+ 
+  d 
+ ---
+  4
+ (1 row)
+ 
+ foo
+ bar
+(22 rows)
+
+TRUNCATE TABLE reload_output;
+--
+-- \o file
+--
+\set outfile2 :abs_builddir '/results/psql-output2'
+\o :outfile2
+select max(unique1) from onek;
+SELECT 1 AS a\; SELECT 2 AS b\; SELECT 3 AS c;
+-- COPY TO file
+-- the data goes into :outfile1 and the command status into :outfile2
+\set QUIET false
+COPY (select unique1 from onek order by unique1 limit 10) TO :'outfile1';
+-- DML command status
+UPDATE onek SET unique1=unique1 WHERE false;
+\set QUIET true
+\o
+COPY reload_output(line) FROM :'outfile1';
+SELECT line FROM reload_output ORDER BY lineno;
+ line 
+--
+ 0
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+(10 rows)
+
+TRUNCATE TABLE reload_output;
+COPY reload_output(line) FROM :'outfile2';
+SELECT line FROM reload_output ORDER BY lineno;
+   line   
+--
+  max 
+ -
+  999
+ (1 row)
+ 
+  a 
+ ---
+  1
+ (1 row)
+ 
+  b 
+ ---
+  2
+ (1 row)
+ 
+  c 
+ ---
+  3
+ (1 row)
+ 
+ COPY 10
+ UPDATE 0
+(22 rows)
+
+TRUNCATE TABLE reload_output;
+\o :outfile2
+-- multiple COPY TO stdout
+-- the data go into :outfile2 and the status is not output
+COPY (select 'foo1') to stdout \; COPY (select 'bar1') to stdout;
+-- combine \o and \g file with multiple COPY queries
+COPY (select 'foo2') to stdout \; COPY (select 'bar2') to stdout \g :outfile1
+\o
+COPY reload_output(line) FROM :'outfile1';
+SELECT line FROM reload_output ORDER BY lineno;
+ line 
+--
+ foo2
+ bar2
+(2 rows)
+
+TRUNCATE TABLE reload_output;
+COPY reload_output(line) FROM :'outfile2';
+SELECT line FROM reload_output ORDER BY lineno;
+ line 
+--
+ foo1
+ bar1
+(2 rows)
+
+TRUNCATE TABLE reload_output;
+--
 -- AUTOCOMMIT and combined queries
 --
 \set AUTOCOMMIT off
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index 8732017e51..ab1f71aa77 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1379,6 +1379,72 @@ SELECT 1 AS one \; SELECT warn('1.5') \; SELECT 2 AS two 
;
 \set SHOW_ALL_RESULTS on
 DROP FUNCTION warn(TEXT);
 
+--
+-- \g file

Re: drop postmaster symlink

2022-11-23 Thread Joe Conway

On 11/23/22 15:10, Robert Haas wrote:

On Wed, Nov 23, 2022 at 2:50 PM Andres Freund  wrote:

On 2022-11-23 10:07:49 -0500, Tom Lane wrote:
> Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> > ...and it helps us to find the "main" process a bit easily.
>
> Hmm, that's a nontrivial point perhaps.  It's certain that this
> will break some other people's start scripts too.

OTOH, postmaster has been deprecated for ~15 years.

Yeah. Also, I don't think it's generally too hard to find the parent
process anyway, because at least on my system, the other ones end up
with ps display that looks like "postgres: logical replication
launcher" or whatever. The main process doesn't set the ps status
display, so that's the only one that shows a full path to the
executable in the ps status, which is how I usually spot it. That has
the advantage that it doesn't matter which name was used to launch it,
too.

Same here

I don't actually care very much whether we get rid of the postmaster
symlink or not, but if we aren't going to, we should stop calling it
deprecated. If 15 years isn't enough time to remove it, what ever will
be? I tend to think it's fairly pointless and perhaps also a bit
confusing, because the product is postgres not postmaster and people
can reasonably expect the binary name to match the product name. But
if we keep it, I don't think anything too dire will happen, either.

FWIW, the reason I took note of the postmaster symlink in the first 
place a few years ago was because selinux treats execution of programs 
from symlinks differently than from actual files.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: wake up logical workers after ALTER SUBSCRIPTION

2022-11-23 Thread Nathan Bossart

On Tue, Nov 22, 2022 at 07:25:36AM +, houzj.f...@fujitsu.com wrote:
> On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu) 
> 
>> Thanks for updating! It becomes better. Further comments:
>> 
>> 01. AlterSubscription()
>> 
>> ```
>> +LogicalRepWorkersWakeupAtCommit(subid);
>> +
>> ```
>> 
>> Currently subids will be recorded even if the subscription is not modified.
>> I think LogicalRepWorkersWakeupAtCommit() should be called inside the if
>> (update_tuple).
> 
> I think an exception would be REFRESH PULLICATION in which case update_tuple 
> is
> false, but it seems better to wake up apply worker in this case as well,
> because the apply worker is also responsible to start table sync workers for
> newly subscribed tables(in process_syncing_tables()).
> 
> Besides, it seems not a must to wake up apply worker for ALTER SKIP 
> TRANSACTION,
> Although there might be no harm for waking up in this case.

In v3, I moved the call to LogicalRepWorkersWakeupAtCommit() to the end of
the function.  This should avoid waking up workers in some cases where it's
unnecessary (e.g., if ALTER SUBSCRIPTION ERRORs in a subtransaction), but
there are still cases where we'll wake up the workers unnecessarily.  I
think this is unlikely to cause any real problems in practice.

>> 02. LogicalRepWorkersWakeupAtCommit()
>> 
>> ```
>> +oldcxt = MemoryContextSwitchTo(TopTransactionContext);
>> +on_commit_wakeup_workers_subids =
>> lappend_oid(on_commit_wakeup_workers_subids,
>> +
>>subid);
>> ```
>> 
>> If the subscription is altered twice in the same transaction, the same subid 
>> will
>> be recorded twice.
>> I'm not sure whether it may be caused some issued, but list_member_oid() can
>> be used to avoid that.
> 
> +1, list_append_unique_oid might be better.

Done in v3.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b80c5934b40034dfbbde411721db1d4171c86d5c Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Nov 2022 16:01:01 -0800
Subject: [PATCH v3 1/1] wake up logical workers after ALTER SUBSCRIPTION

---
 src/backend/access/transam/xact.c|  3 ++
 src/backend/commands/subscriptioncmds.c  |  3 ++
 src/backend/replication/logical/worker.c | 46 
 src/include/replication/logicalworker.h  |  3 ++
 4 files changed, 55 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8086b857b9..dc00e66cfb 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -47,6 +47,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/snapbuild.h"
 #include "replication/syncrep.h"
@@ -2360,6 +2361,7 @@ CommitTransaction(void)
 	AtEOXact_PgStat(true, is_parallel_worker);
 	AtEOXact_Snapshot(true, false);
 	AtEOXact_ApplyLauncher(true);
+	AtEOXact_LogicalRepWorkers(true);
 	pgstat_report_xact_timestamp(0);
 
 	CurrentResourceOwner = NULL;
@@ -2860,6 +2862,7 @@ AbortTransaction(void)
 		AtEOXact_HashTables(false);
 		AtEOXact_PgStat(false, is_parallel_worker);
 		AtEOXact_ApplyLauncher(false);
+		AtEOXact_LogicalRepWorkers(false);
 		pgstat_report_xact_timestamp(0);
 	}
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index d673557ea4..c761785947 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -34,6 +34,7 @@
 #include "nodes/makefuncs.h"
 #include "pgstat.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalworker.h"
 #include "replication/origin.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
@@ -1358,6 +1359,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 
 	table_close(rel, RowExclusiveLock);
 
+	LogicalRepWorkersWakeupAtCommit(subid);
+
 	ObjectAddressSet(myself, SubscriptionRelationId, subid);
 
 	InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index e48a3f589a..d101cddf6c 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -253,6 +253,8 @@ WalReceiverConn *LogRepWorkerWalRcvConn = NULL;
 Subscription *MySubscription = NULL;
 static bool MySubscriptionValid = false;
 
+static List *on_commit_wakeup_workers_subids = NIL;
+
 bool		in_remote_transaction = false;
 static XLogRecPtr remote_final_lsn = InvalidXLogRecPtr;
 
@@ -4092,3 +4094,47 @@ reset_apply_error_context_info(void)
 	apply_error_callback_arg.remote_attnum = -1;
 	set_apply_error_context_xact(InvalidTransactionId, InvalidXLogRecPtr);
 }
+
+/*
+ * Wakeup the stored subscriptions' workers on commit if requested.
+ */
+void
+AtEOXact_LogicalRepWorkers(bool isCommit)
+{
+	if (isCommit && on_commit_wakeup_workers_subids != NIL)
+	{
+		ListCell

Re: Patch: Global Unique Index

2022-11-23 Thread Cary Huang

Hi Simon



Thank you so much for sharing these valuable comments and concerns to our work. 
We understand there is a lot of TODOs left to be done to move forward with this 
in a serious matter. Your comments have been very helpful and we are very 
grateful.



> You don't seem to mention that this would require a uniqueness check

> on each partition. Is that correct? This would result in O(N) cost of

> uniqueness checks, severely limiting load speed. I notice you don't

> offer any benchmarks on load speed or the overhead associated with

> this, which is not good if you want to be taken seriously, but at

> least it is recoverable.



Yes, during INSERT and UPDATE, the uniqueness check happens on every partition 
including the current one. This introduces extra look-up costs and will limit 
the speed significantly especially when there is a large number of partitions. 
This is one drawback of global unique index that needs to be optimized / 
improved.



In fact, all other operations such as CREATE and ATTACH that involve global 
uniqueness check will have certain degree of performance loss as well. See 
benchmark figures below.





> (It might be necessary to specify some partitions as READ ONLY, to

> allow us to record their min/max values for the indexed cols, allowing

> us to do this more quickly.)



Thank you so much for this great suggestion, If there were an indication that 
some partitions have become READ ONLY, record the min/max values of their 
global unique indexed columns to these partitions, then we might be able to 
skip these partitions for uniqueness checking if the value is out of the range 
(min/max)? Did we understand it correctly? Could you help elaborate more?





>> 1. Global unique index is supported only on btree index type

>

> Why? Surely any index type that supports uniqueness is good.



Yes, we can definitely have the same support for other index types that support 
UNIQUE.





>> - Not-supported Features -

>> 1. Global uniqueness check with Sub partition tables is not yet supported as 
>> we do not have immediate use case and it may involve major change in current 
>> implementation

>

> Hmm, sounds like a problem. Arranging the calls recursively should work.



Yes, it is a matter of rearranging the recursive calls to correctly find out 
all "leaves" partitions to be considered for global uniqueness check. So far, 
only the partitions in the first layer is considered.





> My feeling is that performance on this will suck so badly that we must

> warn people away from it, and tell people if they want this, create

> the index at the start and let it load.



Yes, to support global unique index, extra logic needs to be run to ensure 
uniqueness and especially during INSERT and ATTACH where it needs to look up 
all involved partitions. We have a benchmark figures attached below.



This is also the reason that "global" syntax is required so people know they 
really want to have this feature. To help users better understand the potential 
performance drawbacks, should we add a warning in the documentation?





> Hopefully CREATE INDEX CONCURRENTLY still works.



Yes, we verified this global unique index approach on Postgres 14.5 with a 
community CREATE INDEX CONCURRENTLY patch on partitioned table.





> Let's see some benchmarks on this also please.



Here is a simple 'timing' comparison between regular and global unique index on 
a partitioned table having 6 partitions.



global unique index:

-> 156,285ms to insert 6 million records (1 million on each partition)

-> 6,592ms to delete all 6 million records

-> 3,957ms to create global unique index with 6 million records pre-inserted

-> 3,650ms to attach a new partition with 1 million records pre-inserted

-> 17ms to detach a partition with 1 million records in it



regular unique index:

-> 26,007ms to insert 6 million records (1 million on each partition)

-> 7,238ms to delete all  6 million records

-> 2,933ms to create regular unique index with 6 million records pre-inserted

-> 628ms to attach a new partition with 1 million records pre-inserted

-> 17ms to detach a partition with 1 million records in it



These are the commands I use to get the numbers (switch create unique index 
clause between global and regular):

-> \timing on

-> create table test(a int, b int, c text) partition by range (a);

-> create table test1 partition of test for values from (MINVALUE) to (100);

-> create table test2 partition of test for values from (100) to (200);

-> create table test3 partition of test for values from (200) to (300);

-> create table test4 partition of test for values from (300) to (400);

-> create table test5 partition of test for values from (400) to (500);

-> create table test6 partition of test for values from (500) to (600);

-> create unique index myindex on test(b) global;

-> insert into test values(generate_series(0,599),

Re: code cleanups

2022-11-23 Thread Tom Lane

Justin Pryzby  writes:
> Some modest cleanups I've accumulated.

Hmm ...

I don't especially care for either 0001 or 0002, mainly because
I do not agree that this is good style:

-   boolnulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS];
+   boolnulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS] = {0};

It causes the code to be far more in bed than I like with the assumption
that we're initializing to physical zeroes.  The explicit loop method
can be trivially adjusted to initialize to "true" or some other value;
at least for bool arrays, that's true of memset'ing as well.  But this,
if you decide you need something other than zeroes, is a foot-gun.
In particular, someone whose C is a bit weak might mistakenly think that

boolnulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS] = {true};

will set all the array elements to true.  Nor is there a plausible
argument that this is more efficient.  So I don't care for this approach
and I don't want to adopt it.

0003: I agree with getting rid of the duplicated code, but did you go
far enough?  Isn't the code just above those parent checks also pretty
redundant?  It would be more intellectually consistent to move the full
responsibility for setting acl_ok into a subroutine.  This shows in
the patch as you have it because the header comment for
recheck_parent_acl is completely out-of-context.

0004: Right, somebody injected code in a poorly chosen place
(yet another victim of the "add at the end" anti-pattern).

0005: No particular objection, but it's not much of an improvement
either.  It seems maybe a shade less consistent with the following
line.

0006: These changes will cause fetching of one more source byte than
was fetched before.  I'm not sure that's safe, so I don't think this
is an improvement.

regards, tom lane

Re: Transparent column encryption

2022-11-23 Thread Peter Eisentraut


On 28.10.22 12:16, Jehan-Guillaume de Rorthais wrote:

I did a review of the documentation and usability.


I have incorporated some of your feedback into the v11 patch I just posted.


# Applying patch

   The patch applied on top of f13b2088fa2 without trouble. Notice a small
   warning during compilation:
   
 colenccmds.c:134:27: warning: ‘encval’ may be used uninitialized
   
   A simple fix could be:
   
 +++ b/src/backend/commands/colenccmds.c

 @@ -119,2 +119,3
 encval = defGetString(encvalEl);
 +   *encval_p = encval;
 }
 @@ -132,4 +133,2
 *alg_p = alg;
 -   if (encval_p)
 -   *encval_p = encval;
  }


fixed


# Documentation

   * In page "create_column_encryption_key.sgml", both encryption algorithms for
 a CMK are declared as the default one:

 + 
 +  The encryption algorithm that was used to encrypt the key material 
of
 +  this column encryption key.  Supported algorithms are:
 +  
 +   
 +RSAES_OAEP_SHA_1 (default)
 +   
 +   
 +RSAES_OAEP_SHA_256 (default)
 +   
 +  
 + 
   
 As far as I understand the code, I suppose RSAES_OAEP_SHA_1 should be the

 default.


fixed


   I believe two information should be clearly shown to user somewhere in
   chapter 5.5 instead of being buried deep in documentation:
  
   * «COPY does not support column decryption», currently buried in pg_dump page

   * «When transparent column encryption is enabled, the client encoding must
 match the server encoding», currently buried in the protocol description
 page.

   * In the libpq doc of PQexecPrepared2, "paramForceColumnEncryptions" might
 deserve a little more detail about the array format, like «0 means "don't
 enforce" and anything else enforce the encryption is enabled on this
 column». By the way, maybe this array could be an array of boolean?

   * In chapter 55.2.5 (protocol-flow) is stated: «when column encryption is
 used, the plaintext is always in text format (not binary format).». Does it
 means parameter "resultFormat" in "PQexecPrepared2" should always be 0? If
 yes, is it worth keeping this argument? Moreover, this format constraint
 should probably be explained in the libpq page as well.


I will keep these suggestions around.  Some of these things will 
probably change again, so I'll make sure to update the documentation 
when I touch it again.



# Protocol

   * In the ColumnEncryptionKey message, it seems the field holding the length
 key material is redundant with the message length itself, as all other
 fields have a known size. The key material length is the message length -
 (4+4+4+2). For comparison, the AuthenticationSASLContinue message has a
 variable data size but rely only on the message length without additional
 field.


I find that weird, though.  An explicit length seems better.  Things 
like AuthenticationSASLContinue only work if they have exactly one 
variable-length data item.



   * I wonder if encryption related fields in ParameterDescription and
 RowDescription could be optional somehow? The former might be quite large
 when using a lot of parameters (like, imagine a large and ugly
 "IN($1...$N)"). On the other hand, these messages are not sent in high
 frequency anyway...


They are only used if you turn on the column_encryption protocol option. 
 Or did you mean make them optional even then?



# libpq

   Would it be possible to have an encryption-ready PQexecParams() equivalent
   of what PQprepare/describe/exec do?


I plan to do that.


# psql

   * You already mark \d in the TODO. But having some psql command to list the
 known CEK/CMK might be useful as well.


right


   * about write queries using psql, would it be possible to use psql
 parameters? Eg.:

   => \set c1 myval
   => INSERT INTO mytable VALUES (:'c1') \gencr


No, because those are resolved by psql before libpq sees them.


# Manual tests

   * The lookup error message is shown twice for some reason:

   => select * from test_tce;
   no CMK lookup found for realm ""
   
   no CMK lookup found for realm ""


 It might worth adding the column name and the CMK/CEK names related to each
 error message? Last, notice the useless empty line between messages.


I'll look into that.


   * When "DROP IF EXISTS" a missing CEK or CMK, the command raise an
 "unrecognized object type":

   => DROP COLUMN MASTER KEY IF EXISTS noexists;
   ERROR:  unrecognized object type: 10
   => DROP COLUMN ENCRYPTION KEY IF EXISTS noexists;
   ERROR:  unrecognized object type: 8


fixed


   * I was wondering what "pg_typeof" should return. It currently returns
 "pg_encrypted_*". If this is supposed to be transparent from the client
 perspective, shouldn't it

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 1:11 PM Tom Lane  wrote:
> I haven't thought about these issues hard enough to form an overall
> opinion (though I agree that making CREATEROLE less tantamount
> to superuser would be an improvement).  However, I share Mark's
> discomfort about making these commands act differently depending on
> context.  We learned long ago that allowing GUCs to affect query
> semantics was a bad idea.  Basing query semantics on properties
> of the issuing role (beyond success-or-permissions-failure) doesn't
> seem a whole lot different from that.  It still means that
> applications can't just issue command X and expect Y to happen;
> they have to inquire about context in order to find out that Z might
> happen instead.  That's bad in any case, but it seems especially bad
> for security-critical behaviors.

I'm not sure that this behavior qualifies as security-critical. If
INHERITCREATEDROLES and SETCREATEDROLES are both true, then the grant
has INHERIT TRUE and SET TRUE and there are no more rights to be
gained. If not, the createrole user can do something like GRANT
new_role TO my_own_account WITH INHERIT TRUE, SET TRUE. Even if we
somehow disallowed that, they could gain access to the privilege of
the created role in a bunch of other ways, such as granting the rights
to someone else, or just changing the password and using the new
password to log into the account.

When I started working in this area, I thought non-inherited grants
were pretty useless, because you can so easily work around it.
However, other people did not agree. From what I can gather, I think
the reason why people like non-inherited grants is that they prevent
mistakes. A user who has ADMIN OPTION on another role but does not
inherit its privileges can break into that account and do whatever
they want, but they cannot ACCIDENTALLY perform an operation that
makes use of that user's privileges. They will have to SET ROLE, or
GRANT themselves something, and those actions can be logged and
audited if desired. Because of the potential for that sort of logging
and auditing, you can certainly make an argument that this is a
security-critical behavior, but it's not that clear cut, because it's
making assumptions about the behavior of other software, and of human
beings. Strictly speaking, looking just at PostgreSQL, these options
don't affect security.

On the more general question of configurability, I tend to agree that
it's not great to have the behavior of commands depend too much on
context, especially for security-critical things. A particularly toxic
example IMHO is search_path, which I think is an absolute disaster in
terms of security that I believe we will never be able to fully fix.
Yet there are plenty of examples of configurability that no one finds
problematic. No one agitates against the idea that a database can have
a default tablespace, or that you can ALTER USER or ALTER DATABASE to
configure an setting on a user-specific or database-specific setting,
even a security-critical one like search_path, or one that affects
query behavior like work_mem. No one is outraged that a data type has
a default btree operator class that is used for indexes unless you
specify another one explicitly. What people mostly complain about IME
is stuff like standard_conforming_strings, or bytea_output, or
datestyle. Often, when proposal come up on pgsql-hackers and get shot
down on these grounds, the issue is that they would essentially make
it impossible to write SQL that will run portably on PostgreSQL.
Instead, you'd need to write your application to issue different SQL
depending on the value of settings on the local system. That's un-fun
at best, and impossible at worst, as in the case of extension scripts,
whose content has to be static when you ship the thing.

But it's not exactly clear to me what the broader organizing principle
is here, or ought to be. I think it would be ridiculous to propose --
and I assume that you are not proposing -- that no command should have
behavior that in any way depends on what SQL commands have been
executed previously. Taken to a silly extreme, that would imply that
CREATE TABLE ought to be removed, because the behavior of SELECT *
FROM something will otherwise depend on whether someone has previously
issued CREATE TABLE something. Obviously that is a stupid argument.
But on the other hand, it would also be ridiculous to propose the
reverse, that it's fine to add arbitrary settings that affect the
behavior of any command whatsoever in arbitrary ways. Simon's proposal
to add a GUC that would make vacuum request a background vacuum rather
than performing one in the foreground is a good example of a proposal
that did not sit well with either of us.

But I don't know on what basis exactly we put a proposal like this in
one category rather than the other. I'm not sure I can really
articulate the general principle in a sensible way. For me, this
clearly falls into the "good" category: it's configuration

Re: Documentation for building with meson

2022-11-23 Thread samay sharma

Hi,

On Tue, Nov 22, 2022 at 10:36 PM Justin Pryzby  wrote:

> On Mon, Nov 14, 2022 at 10:41:21AM -0800, samay sharma wrote:
>
> > You need LZ4, if you want to support compression of data with that
> > method; see default_toast_compression and wal_compression.
>
> => The first comma is odd.  Maybe it should say "LZ4 is needed to
> support .."
>
> > You need Zstandard, if you want to support compression of data or
> > backups with that method; see wal_compression. The minimum required
> > version is 1.4.0.
>
> Same.
>
> Also, since v15, LZ4 and zstd can both be used by basebackup.
>
> >Some commonly used ones are mentioned in the subsequent sections
>
> => Some commonly used options ...
>
> >  Most of these require additional software, as described in Section
> >  17.3.2, and are set to be auto features.
>
> => "Are set to be auto features" sounds odd.  I think it should say
> something like " .. and are automatically enabled if the required
> software is detected.".
>
> > You can change this behavior by manually setting the auto features to
> > enabled to require them or disabled to not build with them.
>
> remove "auto".  Maybe "enabled" and "disabled" need markup.
>
> > On Windows, the default WinLDAP library is used. It defults to auto
>
> typo: defults
>
> > It defaults to auto and libsystemd and the associated header files need
> > to be installed to use this option.
>
> => write this as two separate sentences.  Same for libxml.
>
> > bsd to use the UUID functions found in FreeBSD, NetBSD, and some other
> > BSD-derived systems
>
> => should remove mention of netbsd, like c4b6d218e
>
> > Enables use of the Zlib library. It defaults to auto and enables
> > support for compressed archives in pg_dump ,pg_restore and
> > pg_basebackup and is recommended.
>
> => The comma is mis-placed.
>
> > The default backend meson uses is ninja and that should suffice for
> > most use cases. However, if you'd like to fully integrate with
> > visual studio, you can set the BACKEND to vs.
>
> => BACKEND is missing markup.
>
> > This option can be used to specify the buildtype to use; defaults to
> > release
>
> => release is missing markup
>
> >  Specify the optimization level. LEVEL can be set to any of
> >  {0,g,1,2,3,s}.
>
> => LEVEL is missing markup
>

Thanks for the feedback. Addressed all and added markup at a few more
places in v6 (attached).

Regards,
Samay

>
> Thanks,
> --
> Justin
>


v6-0001-Add-docs-for-building-with-meson.patch
Description: Binary data

Re: Add sub-transaction overflow status in pg_stat_activity

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 2:01 PM Bruce Momjian  wrote:
> I originally thought having this value in pg_stat_activity was overkill,
> but seeing the other internal/warning columns in that view, I think it
> makes sense.  Oddly, is our 64 snapshot performance limit even
> documented anywhere?  I know it is in Simon's patch I am working on.

If it is, I'm not aware of it. We often don't document things that are
as internal as that.

One thing that I'd really like to see better documented is exactly
what it is that causes a problem. But first we'd have to understand it
ourselves. It's not as simple as "if you have more than 64 subxacts in
any top-level xact, kiss performance good-bye!" because for there to
be a problem, at least one backend (and probably many) have to take
snapshots that include that see that overflowed subxact cache and thus
get marked suboverflowed. Then after that, those snapshots have to be
used often enough that the additional visibility-checking cost becomes
a problem. But it's also not good enough to just use those snapshots
against any old tuples, because tuples that are older than the
snapshot's xmin aren't going to cause additional lookups, nor are
tuples newer than the snapshot's xmax.

So it feels a bit complicated to me to think through the workload
where this really hurts. What I'm imagining is that you need a
relatively long-running transaction that overflows its subxact
limitation but then doesn't commit, so that lots of other backends get
overflowed snapshots, and also so that the xmin and xmax of the
snapshots being taken get further apart. Or maybe you can have a
series short-running transactions that each overflow their subxact
cache briefly, but they overlap, so that there's usually at least 1
around in that state, but in that case I think you need a separate
long-running transaction to push xmin and xmax further apart. Either
way, the backends that get the overflowed snapshots then need to go
look at some table data that's been recently modified, so that there
are xmin and xmax values newer than the snapshot's xmin.

Intuitively, I feel like this should be pretty rare, and largely
avoidable if you just don't use long-running transactions, which is a
good thing to avoid for other reasons anyway. But there may be more to
it than I'm realizing, because I've seen customers hit this issue
multiple times. I wonder whether there's some subtlety to the
triggering conditions that I'm not fully understanding.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-23 Thread David G. Johnston

On Wed, Nov 23, 2022 at 1:04 PM Robert Haas  wrote:

>
> I'm not very certain about any of that stuff; I don't have a clear
> mental model of how it should work, or even what exact problem we're
> trying to solve. To me, the patches that I posted make sense as far as
> they go, but I'm not under the illusion that they solve all the
> problems in this area, or even that I understand what all of the
> problems are.
>
>
I haven't yet formed a complete thought here but is there any reason we
cannot convert the permission-like attributes to predefined roles?

pg_login
pg_replication
pg_bypassrls
pg_createdb
pg_createrole
pg_haspassword (password and valid until)
pg_hasconnlimit

Presently, attributes are never inherited, but having that be controlled
via the INHERIT property of the grant seems desirable.

WITH ADMIN controls passing on of membership to other roles.

Example:
I have pg_createrole (set, noinherit, no with admin), pg_password (no set,
inherit, no with admin), and pg_createdb (set, inherit, with admin),
pg_login (no set, inherit, with admin)
Roles I create cannot be members of pg_createrole or pg_password but can be
given pg_createdb and pg_login (this would be a way to enforce external
authentication for roles created by me)
I can execute CREATE DATABASE due to inheriting pg_createdb
I must set role to pg_createrole in order to execute CREATE ROLE
Since I don't have admin on pg_createrole I cannot change my own
set/inherit, but I could do that for pg_createdb

David J.

Re: Documentation for building with meson

2022-11-23 Thread samay sharma

Hi,

On Wed, Nov 23, 2022 at 12:16 PM Justin Pryzby  wrote:

> On Wed, Nov 23, 2022 at 11:30:54AM -0800, samay sharma wrote:
> > Thanks for the feedback. Addressed all and added markup at a few more
> > places in v6 (attached).
>
> Thanks.  It looks good to me.  A couple thoughts, maybe they're not
> important.
>

Thank you. Attaching v7 addressing most of the points below.


>
>  - LZ4 and Zstd refer to wal_compression and default_toast_compression,
>but not to any corresponding option to basebackup.
>
>  - There's no space after the hash mark here; but above, there was:
>#Setup build directory with a different installation prefix
>

Added a space as that looks better.

>
>  - You use slash to show enumerated options, but it's more typical to
>use braces: {a | b | c}:
>-Dnls=auto/enabled/disabled
>

Changed.

>
>  - There's no earlier description/definition of an "auto" feature, but
>still says this:
>"Setting this option allows you to override value of all 'auto'
> features"
>

Described what an "auto" feature is in ().

>
>  - Currently the documentation always refers to "PostgreSQL", but you
>added two references to "Postgres":
>+ If a program required to build Postgres...
>+ Once Postgres is built...
>

Good catch. Changed to PostgreSQL.

Regards,
Samay

>
> --
> Justin
>


v7-0001-Add-docs-for-building-with-meson.patch
Description: Binary data

Re: drop postmaster symlink

2022-11-23 Thread Tom Lane

Andres Freund  writes:
> On 2022-11-23 10:07:49 -0500, Tom Lane wrote:
>> On the whole, is it really that hard to add the symlink to the meson build?

> No. Meson has a builtin command for it, just not in the meson version we're
> currently requiring. We can create the symlink ourselves instead. The problem
> is just detecting systems where we can't symlink and what to fall back to
> there.

This isn't a hill I want to die on, either way.  But "it's a bit
more complicated in meson" seems like a poor reason for changing
the user-visible installed fileset.

regards, tom lane

Re: Add LZ4 compression in pg_dump

2022-11-23 Thread Justin Pryzby

On Tue, Nov 22, 2022 at 10:00:47AM +, gkokola...@pm.me wrote:
> For the record I am currently working on it simply unsure if I should submit
> WIP patches and add noise to the list or wait until it is in a state that I
> feel that the comments have been addressed.
> 
> A new version that I feel that is in a decent enough state for review should
> be ready within this week. I am happy to drop the patch if you think I should
> not work on it though.

I hope you'll want to continue work on it.  The patch record is like a
request for review, so it's closed if there's nothing ready to review.

I think you should re-send patches (and update the CF app) as often as
they're ready for more review.  Your 001 commit (which is almost the
same as what I wrote 2 years ago) still needs to account for some review
comments, and the whole patch set ought to pass cirrusci tests.  At that
point, you'll be ready for another round of review, even if there's
known TODO/FIXME items in later patches.

BTW I saw that you updated your branch on github.  You'll need to make
the corresponding changes to ./meson.build that you made to ./Makefile.
https://wiki.postgresql.org/wiki/Meson_for_patch_authors
https://wiki.postgresql.org/wiki/Meson  

-- 
Justin

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 3:33 PM Tom Lane  wrote:
> > Because they are role-level properties, they can be set by whoever has
> > ADMIN OPTION on the role. That always includes every superuser, and it
> > never includes Alice herself (except if she's a superuser).
>
> That is just bizarre.  Alice can do X, and she can do Y, but she
> can't control a flag that says which of those happens by default?
> How is that sane (disregarding the question of whether the existence
> of the flag is a good idea, which I'm now even less sold on)?

Look, I admitted later in that same email that I don't really know
what the rules for setting role-level properties ought to be. If you
have an idea, I'd love to hear it, but I'd rather if you didn't just
label things into which I have put quite a bit of work as insane
without giving any constructive feedback, especially if you haven't
yet fully understood the proposal.

Your description of the behavior here is not quite accurate.
Regardless of how the flags are set, alice, as a CREATEROLE user, can
gain access to all the privileges of the target role, and she can
arrange to have a grant of permissions on that role with INHERIT TRUE
and SET TRUE. However, there's a difference between the case where (a)
INHERITCREATEDROLE and SETCREATEDROLE are set, and alice gets the
permissions of the role by default and the one where (b)
NOINHERITCREATEDROLE and NOSETCREATEDROLE are set, and therefore alice
gets the permissions only if she does GRANT created_role TO ALICE WITH
INHERIT TRUE, SET TRUE. In the former case, there is only one grant,
and it has 
grantor=bootstrap_superuser/admin_option=true/inherit_option=true/set_option=true.
In the latter case there are two, one with
grantor=bootstrap_supeuser/admin_option=true/set_option=false/inherit_option=false
and a second with
grantor=alice/admin_option=false/set_option=true/inherit_option=true.
That is pretty nearly equivalent, but it is not the same, and it will
not, for example, be dumped in the same way. Furthermore, it's not
equivalent in the other direction at all. If the superuser gives alice
INHERITCREATEDROLES and SETCREATEDROLES, she can't renounce those
permissions in the patch as written. All of which is to say that I
don't think your characterization of this as "Alice can do X, and she
can do Y, but she can't control a flag that says which of those
happens by default?" is really correct. It's subtler than that.

But having said that, I could certainly change the patches so that any
user, or maybe just a createrole user since it's otherwise irrelevant,
can flip the INHERITCREATEDROLE and SETCREATEDROLE bits on their own
account. There would be no harm in that from a security or auditing
perspective AFAICS. It would, however, make the handling of those
flags different from the handling of most other role-level properties.
That is in fact why things ended up the way that they did: I just made
the new role-level properties which I added work like most of the
existing ones. I don't think that's insane at all. I even think it
might be the right decision. But it's certainly arguable. If you think
it should work differently, make an argument for that. What I would
particularly like to hear in such an argument, though, is a theory
that goes beyond those two particular properties and addresses what
ought to be done with all the other ones, especially CREATEDB and
CREATEROLE. If we can't come up with such a grand unifying theory but
are confident we know what to do about this case, so be it, but we
shouldn't make an idiosyncratic rule for this case without at least
thinking about the overall picture.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: drop postmaster symlink

2022-11-23 Thread Andres Freund

Hi,

On 2022-11-23 15:48:04 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-11-23 10:07:49 -0500, Tom Lane wrote:
> >> On the whole, is it really that hard to add the symlink to the meson build?
>
> > No. Meson has a builtin command for it, just not in the meson version we're
> > currently requiring. We can create the symlink ourselves instead. The 
> > problem
> > is just detecting systems where we can't symlink and what to fall back to
> > there.
>
> This isn't a hill I want to die on, either way.  But "it's a bit
> more complicated in meson" seems like a poor reason for changing
> the user-visible installed fileset.

I wouldn't even have thought about proposing dropping the symlink if it hadn't
been deprecated forever, and I suspect Peter wouldn't have either...

I think this is a bit more more complicated than "changing the user-visible
installed fileset" because we didn't have logic to create 'postmaster' on
windows before, afaik the only OS without reliable symlink support.

Anyway, my current thinking is to have dumb OS dependent behaviour and create
a symlink everywhere but windows, where we'd just copy the file.

Or we could just continue to not install 'postmaster' on windows, because of
the potential confusion that 'postmaster.exe' differing from 'postgres.exe'
could cause.

Greetings,

Andres Freund

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Thomas Munro

On Wed, Nov 23, 2022 at 11:03 PM Thomas Munro  wrote:
> On Wed, Nov 23, 2022 at 2:42 PM Andres Freund  wrote:
> > The failure has to be happening in wait_for_postmaster_promote(), because 
> > the
> > standby2 is actually successfully promoted.
>
> I assume this is ext4.  Presumably anything that reads the
> controlfile, like pg_ctl, pg_checksums, pg_resetwal,
> pg_control_system(), ... by reading without interlocking against
> writes could see garbage.  I have lost track of the versions and the
> thread, but I worked out at some point by experimentation that this
> only started relatively recently for concurrent read() and write(),
> but always happened with concurrent pread() and pwrite().  The control
> file uses the non-p variants which didn't mash old/new data like
> grated cheese under concurrency due to some implementation detail, but
> now does.

As for what to do about it, some ideas:

1.  Use advisory range locking.  (This would be an advisory version of
what many other filesystems do automatically, AFAIK.  Does Windows
have a thing like POSIX file locking, or need it here?)
2.  Retry after a short time on checksum failure.  The probability is
already miniscule, and becomes pretty close to 0 if we read thrice
100ms apart.
3.  Some scheme that involves renaming the file into place.  (That
might be a pain on Windows; it only works for the relmap thing because
all readers and writers are in the backend and use an LWLock to avoid
silly handle semantics.)
4.  ???

First thought is that 2 is appropriate level of complexity for this
rare and stupid problem.

Re: More efficient build farm animal wakeup?

2022-11-23 Thread Tom Lane

Thomas Munro  writes:
> On Thu, Nov 24, 2022 at 10:00 AM Magnus Hagander  wrote:
>> On Wed, Nov 23, 2022 at 9:15 AM Thomas Munro  wrote:
>> Are you saying you still think it's worth pursuing longpoll or similar 
>> methods for it, or that this is good enough?

> I personally think it'd be pretty neat, to squeeze out that last bit
> of latency.  Maybe it's overkill...

I can't get excited about pursuing the last ~30 seconds of delay
for launching tasks that are going to run 10 or 20 or more minutes
(where the future trend of those numbers is surely up not down).

The thing that was really significantly relevant here IMO was to
reduce the load on the central server, and I think we've done that.
Would adding longpoll reduce that further?  In principle maybe,
but I'm not sure we have enough animals to make it worthwhile.

regards, tom lane

Re: Another multi-row VALUES bug

2022-11-23 Thread Dean Rasheed

On Wed, 23 Nov 2022 at 15:30, Tom Lane  wrote:
>
> > So I think what the code needs to do is examine the targetlist, and
> > identify the VALUES RTE that the current query is using as a source,
> > and rewrite just that RTE (so any original VALUES RTE is rewritten at
> > the top level, and any VALUES RTEs from rule actions are rewritten
> > while recursing, and none are rewritten more than once), as in the
> > attached patch.
>
> Hmm ... this patch does not feel any more principled or future-proof
> than what it replaces, because now instead of making assumptions
> about what's in the jointree, you're making assumptions about what's
> in the targetlist.

True, but it's consistent with what rewriteValuesRTE() does -- it has
to examine the targetlist to work out how items in the VALUES lists
are mapped to attributes of the target relation.

> I wonder if there is some other way to identify
> the target VALUES RTE.
>
> Looking at the parsetree in gdb, I see that in this example the
> VALUES RTE is still the first entry in the fromlist, it's just not
> the only one there.  So I wonder whether it'd be sufficient to do
>
> -if (list_length(parsetree->jointree->fromlist) == 1)
> +if (parsetree->jointree->fromlist != NIL)
>
> I'm not 100% sure that product-query rewriting would always produce
> a FROM-list in this order, but I think it might be true.

No, the test case using rule r3 is a counter-example. In that case,
the product query has 2 VALUES RTEs, both of which appear in the
fromlist, and it's the second one that needs rewriting when it
recurses into the product query.

In fact, looking at what rewriteRuleAction() does, the relevant VALUES
RTE will be the last or last-but-one entry in the fromlist, depending
on whether the rule action refers to OLD. Relying on a particular
ordering of the fromlist seems quite fragile though.

> Another idea is to identify the VALUES RTE before we start rewriting,
> and pass that information on.  That should be pretty bulletproof,
> but of course more invasive.
>
> Or ... maybe we should perform this particular step before we build
> product queries?  Just because we stuck it into QueryRewrite
> originally doesn't mean that's the right place.

Hmm, I'm not quite sure how that would work. Possibly we could
identify the VALUES RTE while building the product query, but that
looks pretty messy.

Regards,
Dean

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Tomas Vondra




On 11/23/22 14:07, David Rowley wrote:
> On Fri, 18 Nov 2022 at 03:34, Tomas Vondra
>  wrote:
>> I did some simple benchmark with v2 and v3, using the attached script,
>> which essentially just builds hash index on random data, with different
>> data types and maintenance_work_mem values. And what I see is this
>> (median of 10 runs):
> 
>> So to me it seems v2 performs demonstrably better, v3 is consistently
>> slower - not only compared to v2, but often also to master.
> 
> Could this just be down to code alignment changes?  There does not
> really seem to be any fundamental differences which would explain
> this.
> 

Could be, but then how do we know the speedup with v2 is not due to code
alignment too?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: drop postmaster symlink

2022-11-23 Thread Robert Haas

On Wed, Nov 23, 2022 at 2:50 PM Andres Freund  wrote:
> On 2022-11-23 10:07:49 -0500, Tom Lane wrote:
> > Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> > > ...and it helps us to find the "main" process a bit easily.
> >
> > Hmm, that's a nontrivial point perhaps.  It's certain that this
> > will break some other people's start scripts too.
>
> OTOH, postmaster has been deprecated for ~15 years.

Yeah. Also, I don't think it's generally too hard to find the parent
process anyway, because at least on my system, the other ones end up
with ps display that looks like "postgres: logical replication
launcher" or whatever. The main process doesn't set the ps status
display, so that's the only one that shows a full path to the
executable in the ps status, which is how I usually spot it. That has
the advantage that it doesn't matter which name was used to launch it,
too.

I don't actually care very much whether we get rid of the postmaster
symlink or not, but if we aren't going to, we should stop calling it
deprecated. If 15 years isn't enough time to remove it, what ever will
be? I tend to think it's fairly pointless and perhaps also a bit
confusing, because the product is postgres not postmaster and people
can reasonably expect the binary name to match the product name. But
if we keep it, I don't think anything too dire will happen, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Documentation for building with meson

2022-11-23 Thread Justin Pryzby

On Wed, Nov 23, 2022 at 11:30:54AM -0800, samay sharma wrote:
> Thanks for the feedback. Addressed all and added markup at a few more
> places in v6 (attached).

Thanks.  It looks good to me.  A couple thoughts, maybe they're not
important.

 - LZ4 and Zstd refer to wal_compression and default_toast_compression,
   but not to any corresponding option to basebackup.

 - There's no space after the hash mark here; but above, there was:
   #Setup build directory with a different installation prefix

 - You use slash to show enumerated options, but it's more typical to
   use braces: {a | b | c}:
   -Dnls=auto/enabled/disabled

 - There's no earlier description/definition of an "auto" feature, but
   still says this:
   "Setting this option allows you to override value of all 'auto' features"

 - Currently the documentation always refers to "PostgreSQL", but you
   added two references to "Postgres":
   + If a program required to build Postgres...
   + Once Postgres is built...

-- 
Justin

Re: Document parameter count limit

2022-11-23 Thread Tom Lane

"David G. Johnston"  writes:
> I do believe that people who want to use a large parameter list likely have
> that question in the back of their mind, and looking at a page called
> "System Limits" is at least plausibly something they would do.  Since they
> are really caring about parse-bind-execute, and they aren't likely to dig
> into libpq, this seems like the best spot (as opposed to, say PREPARE)

This is a wire-protocol limitation; libpq is only the messenger.
So if we're going to document it, I agree that limits.sgml is the place.

(BTW, I'm not certain that PREPARE has the same limit.  It'd fall over
at INT_MAX likely, or maybe sooner for lack of memory, but I don't
recall that there's any uint16 fields in that code path.)

regards, tom lane

Re: Add sub-transaction overflow status in pg_stat_activity

2022-11-23 Thread Andres Freund

Hi,

On 2022-11-23 15:25:39 -0500, Robert Haas wrote:
> One thing that I'd really like to see better documented is exactly
> what it is that causes a problem. But first we'd have to understand it
> ourselves. It's not as simple as "if you have more than 64 subxacts in
> any top-level xact, kiss performance good-bye!" because for there to
> be a problem, at least one backend (and probably many) have to take
> snapshots that include that see that overflowed subxact cache and thus
> get marked suboverflowed. Then after that, those snapshots have to be
> used often enough that the additional visibility-checking cost becomes
> a problem. But it's also not good enough to just use those snapshots
> against any old tuples, because tuples that are older than the
> snapshot's xmin aren't going to cause additional lookups, nor are
> tuples newer than the snapshot's xmax.

Indeed. This is why I was thinking that just alerting for overflowed xact
isn't particularly helpful. You really want to see how much they overflow and
how often.

But even that might not be that helpful. Perhaps what we actually need is an
aggregate measure showing the time spent doing subxact lookups due to
overflowed snapshots? Seeing a substantial amount of time spent doing subxact
lookups would be much more accurate call to action than seeing a that some
sessions have a lot of subxacts.

> Intuitively, I feel like this should be pretty rare, and largely
> avoidable if you just don't use long-running transactions, which is a
> good thing to avoid for other reasons anyway.

I think they're just not always avoidable, even in a very well operated
system.

I wonder if we could lower the impact of suboverflowed snapshots by improving
the representation in PGPROC and SnapshotData. What if we

a) Recorded the min and max assigned subxid in PGPROC

b) Instead of giving up in GetSnapshotData() once we see a suboverflowed
   PGPROC, store the min/max subxid of the proc in SnapshotData. We could
   reliably "steal" space for that from ->subxip, as we won't need to store
   subxids for that proc.

c) When determining visibility with a suboverflowed snapshot we use the
   ranges from b) to check whether we need to do a subtrans lookup. I think
   that'll often prevent subtrans lookups.

d) If we encounter a subxid whose parent is in progress and not in ->subxid,
   and subxcnt isn't the max, add that subxid to subxip. That's not free
   because we'd basically need to do an insertion sort, but likely still a lot
   cheaper than doing repeated subtrans lookups.

I think we'd just need a one or two additional fields in SnapshotData.

Greetings,

Andres Freund

Re: fixing CREATEROLE

2022-11-23 Thread David G. Johnston

On Wed, Nov 23, 2022 at 2:01 PM Robert Haas  wrote:

> In the latter case there are two, one with
>
> grantor=bootstrap_supeuser/admin_option=true/set_option=false/inherit_option=false
> and a second with
> grantor=alice/admin_option=false/set_option=true/inherit_option=true.
>

This, IMO, is preferable.  And I'd probably typically want to hide the
first grant from the user in typical cases - it is an implementation detail.

We have to grant the creating role membership in the created role, with
admin option, as a form of bookkeeping.

If the creating role really wants to be a member of the created role for
other reasons that should be done explicitly and granted by the creating
role.

This patch series need not be concerned about how easy or difficult it is
to get the additional grant entry into the database.  The ability to refine
the permissions in the data model is there so there should be no complaints
that "it is impossible to set up this combination of permissions".  We've
provided a detailed model and commands to alter it - the users can build
their scripts to glue those things together.

David J.

Re: fixing CREATEROLE

2022-11-23 Thread Tom Lane

"David G. Johnston"  writes:
> On Wed, Nov 23, 2022 at 2:18 PM Robert Haas  wrote:
>> Either way, I'm not quite sure what the benefit of converting these
>> things to predefined roles is.

> Specifically, you gain inheritance/set and "admin option" for free.

Right: the practical issue with CREATEROLE/CREATEDB is that you need
some mechanism for managing who can grant those privileges.  The
current answer isn't very flexible, which has been complained of
repeatedly.  If they become predefined roles then we get a lot of
already-built-out infrastructure to solve that, instead of having to
write even more single-purpose logic.  I think it's a sensible future
path, but said lack of flexibility hasn't yet spurred anyone to do it.

regards, tom lane

Re: Allow single table VACUUM in transaction block

2022-11-23 Thread Justin Pryzby

On Tue, Nov 22, 2022 at 05:16:59PM +, Simon Riggs wrote:
> Justin, if you wanted to take up the patch from here, I would be more
> than happy. You have the knowledge and insight to make this work
> right.

I have no particular use for this, so I wouldn't be a good person to
finish or shepherd the patch.

-- 
Justin

Re: Patch: Global Unique Index

2022-11-23 Thread Thomas Kellerer


Tom Lane schrieb am 18.11.2022 um 16:06:

Do we need new syntax actually? I think that a global unique index
can be created automatically instead of raising an error "unique
constraint on partitioned table must include all partitioning
columns"


I'm not convinced that we want this feature at all: as far as I can
see, it will completely destroy the benefits of making a partitioned
table in the first place.  But if we do want it, I don't think it
should be so easy to create a global index by accident as that syntax
approach would make it.  I think there needs to be a pretty clear YES
I WANT TO SHOOT MYSELF IN THE FOOT clause in the command.


There are many Oracle users that find global indexes useful despite
their disadvantages.

I have seen this mostly when the goal was to get the benefits of
partition pruning at runtime which turned the full table scan (=Seq Scan)
on huge tables to partition scans on much smaller partitions.
Partition wise joins were also helpful for query performance.
The substantially slower drop partition performance was accepted in thos cases

I think it would be nice to have the option in Postgres as well.

I do agree however, that the global index should not be created automatically.

Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better


Just my 0.05€

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley

On Wed, 16 Nov 2022 at 17:33, Simon Riggs  wrote:
>
> Thanks for the review, apologies for the delay in acting upon your comments.
>
> My tests show the sorted and random tests are BOTH 4.6% faster with
> the v3 changes using 5-test avg, but you'll be pleased to know your
> kit is about 15.5% faster than mine, comparing absolute execution
> times.

Thanks for the updated patch.

I started to look at this again and I'm starting to think that the
HashInsertState struct is the wrong approach for passing along the
sorted flag to _hash_doinsert().  The reason I think this is that in
hashbuild() when setting buildstate.spool to NULL, you're also making
the decision about what to set the sorted flag to.  However, in
reality, we already know what we should be passing *every* time we
call _hash_doinsert().  The only place where we can pass the sorted
option as true is in _h_indexbuild() when we're doing the sorted
version of the index build. Trying to make that decision any sooner
seems error-prone.

I understand you have made HashInsertState so that we don't need to
add new parameters should we ever need to pass something else along,
but I'm just thinking that if we ever need to add more, then we should
just reconsider this in the future.  I think for today, the better
option is just to add the bool sorted as a parameter to
_hash_doinsert() and pass it as true in the single case where it's
valid to do so.  That seems less likely that we'll inherit some
options from some other place after some future modification and end
up passing sorted as true when it should be false.

Another reason I didn't like the HashInsertState idea is that in the
v3 patch there's an HashInsertState in both HashBuildState and HSpool.
Because in the normal insert path (hashinsert), we've neither a
HashBuildState nor an HSpool, you're having to fake up a
HashInsertStateData to pass something along to _hash_doinsert() in
hashinsert(). When we're building an index, in the non-sorted index
build case, you're always passing the HashInsertStateData from the
HashBuildState, but when we're doing the sorted index build the one
from HSpool is passed.  In other words, in each of the 3 calls to
_hash_doinsert(), the HashInsertStateData comes from a different
place.

Now, I do see that you've coded hashbuild() so both versions of the
HashInsertState point to the same HashInsertStateData, but I find it
unacceptable programming that in _h_spoolinit() the code palloc's the
memory for the HSpool and you're setting the istate field to the
HashInsertStateData that's on the stack. That just seems like a great
way to end up having istate pointing to junk should the HSpool ever
live beyond the hashbuild() call. If we really don't want HSpool to
live beyond hashbuild(), then it too should be a local variable to
hashbuild() instead of being palloc'ed in _h_spoolinit().
_h_spoolinit() could just be passed a pointer to the HSpool to
populate.

After getting rid of the  HashInsertState code and just adding bool
sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
is much more simple:

v3:
 src/backend/access/hash/hash.c   | 19 ---
 src/backend/access/hash/hashinsert.c | 40
++--
 src/backend/access/hash/hashsort.c   |  8 ++--
 src/include/access/hash.h| 14 +++---
 4 files changed, 67 insertions(+), 14 deletions(-)

v4:
src/backend/access/hash/hash.c   |  4 ++--
src/backend/access/hash/hashinsert.c | 40 
src/backend/access/hash/hashsort.c   |  3 ++-
src/include/access/hash.h|  6 --
4 files changed, 40 insertions(+), 13 deletions(-)

and v4 includes 7 extra lines in hashinsert.c for the Assert() I
mentioned in my previous email plus a bunch of extra comments.

I'd rather see this solved like v4 is doing it.

David
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index c361509d68..77fd147f68 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -231,7 +231,7 @@ hashbuildCallback(Relation index,
itup = index_form_tuple(RelationGetDescr(index),
index_values, 
index_isnull);
itup->t_tid = *tid;
-   _hash_doinsert(index, itup, buildstate->heapRel);
+   _hash_doinsert(index, itup, buildstate->heapRel, false);
pfree(itup);
}

@@ -265,7 +265,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull,
itup = index_form_tuple(RelationGetDescr(rel), index_values, 
index_isnull);
itup->t_tid = *ht_ctid;

-   _hash_doinsert(rel, itup, heapRel);
+   _hash_doinsert(rel, itup, heapRel, false);

pfree(itup);

diff --git a/src/backend/access/hash/hashinsert.c 
b/src/backend/access/hash/hashinsert.c
index 23907d2e5b..6718ff18f3 100644
--- a/src/backend/access/hash/hashinsert.c
+++

Re: [PATCH] minor optimization for ineq_histogram_selectivity()

2022-11-23 Thread Tom Lane

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?=  writes:
> On 10/24/22 17:26, Frédéric Yhuel wrote:
>> When studying the weird planner issue reported here [1], I came up with 
>> the attached patch. It reduces the probability of calling 
>> get_actual_variable_range().

> This isn't very useful anymore thanks to this patch: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c

I hadn't looked at this patch before, but now that I have, I'm inclined
to reject it anyway.  It just moves the problem around: now, instead of
possibly doing an unnecessary index probe at the right end, you're
possibly doing an unnecessary index probe at the left end.  It also
looks quite weird compared to the normal coding of binary search.

I wonder if there'd be something to be said for leaving the initial
probe calculation alone and doing this:

else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
+   {
+   /* Don't probe the endpoint until we have to. */
+   if (probe > lobound)
+   probe--;
+   else
have_end = get_actual_variable_range(root,
 vardata,
 sslot.staop,
 collation,
 NULL,
 [probe]);
+   }

On the whole though, it seems like a wart.

regards, tom lane

Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-23 Thread Amit Kapila

On Tue, Nov 22, 2022 at 7:30 AM houzj.f...@fujitsu.com
 wrote:
>

Few minor comments and questions:

1.
+static void
+LogicalParallelApplyLoop(shm_mq_handle *mqh)
{
+ for (;;)
+ {
+ void*data;
+ Size len;
+
+ ProcessParallelApplyInterrupts();
...
...
+ if (rc & WL_LATCH_SET)
+ {
+ ResetLatch(MyLatch);
+ ProcessParallelApplyInterrupts();
+ }
...
}

Why ProcessParallelApplyInterrupts() is called twice in
LogicalParallelApplyLoop()?

2.
+ * This scenario is similar to the first case but TX-1 and TX-2 are executed by
+ * two parallel apply workers (PA-1 and PA-2 respectively). In this scenario,
+ * PA-2 is waiting for PA-1 to complete its transaction while PA-1 is waiting
+ * for subsequent input from LA. Also, LA is waiting for PA-2 to complete its
+ * transaction in order to preserve the commit order. There is a deadlock among
+ * three processes.
+ *
...
...
+ *
+ * LA (waiting to acquire the local transaction lock) -> PA-1 (waiting to
+ * acquire the lock on the unique index) -> PA-2 (waiting to acquire the lock
+ * on the remote transaction) -> LA
+ *

Isn't the order of PA-1 and PA-2 different in the second paragraph as
compared to the first one.

3.
+ * Deadlock-detection
+ * --

It may be better to keep the title of this section as Locking Considerations.

4. In the section mentioned in Point 3, it would be better to
separately explain why we need session-level locks instead of
transaction level.

5. Add the below comments in the code:
diff --git a/src/backend/replication/logical/applyparallelworker.c
b/src/backend/replication/logical/applyparallelworker.c
index 9385afb6d2..56f00defcf 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -431,6 +431,9 @@ pa_free_worker_info(ParallelApplyWorkerInfo *winfo)
if (winfo->dsm_seg != NULL)
dsm_detach(winfo->dsm_seg);

+   /*
+* Ensure this worker information won't be reused during
worker allocation.
+*/
ParallelApplyWorkersList = list_delete_ptr(ParallelApplyWorkersList,

winfo);

@@ -762,6 +765,10 @@
HandleParallelApplyMessage(ParallelApplyWorkerInfo *winfo, StringInfo
msg)
 */
error_context_stack = apply_error_context_stack;

+   /*
+* The actual error must be already
reported by parallel apply
+* worker.
+*/
ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("parallel
apply worker exited abnormally"),




-- 
With Regards,
Amit Kapila.

Re: [PATCH] minor optimization for ineq_histogram_selectivity()

2022-11-23 Thread Frédéric Yhuel





On 10/24/22 17:26, Frédéric Yhuel wrote:

Hello,

When studying the weird planner issue reported here [1], I came up with 
the attached patch. It reduces the probability of calling 
get_actual_variable_range().


This isn't very useful anymore thanks to this patch: 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c


Unless we want to save a hundred page reads in rare cases.

Re: Prefetch the next tuple's memory during seqscans

2022-11-23 Thread Bruce Momjian

On Wed, Nov  2, 2022 at 12:42:11AM +1300, Thomas Munro wrote:
> On Wed, Nov 2, 2022 at 12:09 AM Andy Fan  wrote:
> > By theory, Why does the preferch make thing better? I am asking this
> > because I think we need to read the data from buffer to cache line once
> > in either case (I'm obvious wrong in face of the test result.)
> 
> CPUs have several different kinds of 'hardware prefetchers' (worth
> reading about), that look out for sequential and striding patterns and
> try to get the cache line ready before you access it.  Using the
> prefetch instructions explicitly is called 'software prefetching'
> (special instructions inserted by programmers or compilers).  The
> theory here would have to be that the hardware prefetchers couldn't
> pick up the pattern, but we know how to do it.  The exact details of
> the hardware prefetchers vary between chips, and there are even some
> parameters you can adjust in BIOS settings.  One idea is that the
> hardware prefetchers are generally biased towards increasing
> addresses, but our tuples tend to go backwards on the page[1].  It's
> possible that some other CPUs can detect backwards strides better, but
> since real world tuples aren't of equal size anyway, there isn't
> really a fixed stride at all, so software prefetching seems quite
> promising for this...
> 
> [1] 
> https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-PAGE-LAYOUT-FIGURE

I remember someone showing that having our item pointers at the _end_ of
the page and tuples at the start moving toward the end increased
performance significantly.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Another multi-row VALUES bug

2022-11-23 Thread Dean Rasheed

I was thinking some more about the recent fix to multi-row VALUES
handling in the rewriter (b8f2687fdc), and I realised that there is
another bug in the way DEFAULT values are handled:

In RewriteQuery(), the code assumes that in a multi-row INSERT query,
the VALUES RTE will be the only thing in the query's fromlist. That's
true for the original query, but it's not necessarily the case for
product queries, if the rule action performs a multi-row insert,
leading to a new VALUES RTE that the DEFAULT-handling code might fail
to process. For example:

CREATE TABLE foo(a int);
INSERT INTO foo VALUES (1);

CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text);
CREATE RULE foo_r AS ON UPDATE TO foo
  DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'),
 (DEFAULT, new.a, 'new');

UPDATE foo SET a = 2 WHERE a = 1;

ERROR:  unrecognized node type: 43

There's a similar example to this in the regression tests, but it
doesn't test DEFAULT-handling.

It's also possible for the current code to cause the same VALUES RTE
to be rewritten multiple times, when recursing into product queries
(if the rule action doesn't add any more stuff to the query's
fromlist). That turns out to be harmless, because the second time
round it will no longer contain any defaults, but it's technically
incorrect, and certainly a waste of cycles.

So I think what the code needs to do is examine the targetlist, and
identify the VALUES RTE that the current query is using as a source,
and rewrite just that RTE (so any original VALUES RTE is rewritten at
the top level, and any VALUES RTEs from rule actions are rewritten
while recursing, and none are rewritten more than once), as in the
attached patch.

While at it, I noticed an XXX code comment questioning whether any of
this applies to MERGE. The answer is "no", because MERGE actions don't
allow multi-row inserts, so I think it's worth updating that comment
to make that clearer.

Regards,
Dean
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
new file mode 100644
index fb0c687..5ec4b91
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -3743,26 +3743,34 @@ RewriteQuery(Query *parsetree, List *rew
 		 */
 		if (event == CMD_INSERT)
 		{
+			ListCell   *lc2;
 			RangeTblEntry *values_rte = NULL;
 
 			/*
 			 * If it's an INSERT ... VALUES (...), (...), ... there will be a
-			 * single RTE for the VALUES targetlists.
+			 * unique VALUES RTE referred to by the targetlist.  At the top
+			 * level, this will be the VALUES lists from the original query,
+			 * and while recursively processing each product query, it will be
+			 * the VALUES lists from the rule action, if any.  As with
+			 * rewriteValuesRTE(), we need only worry about targetlist entries
+			 * that contain simple Vars referencing the VALUES RTE.
 			 */
-			if (list_length(parsetree->jointree->fromlist) == 1)
+			foreach(lc2, parsetree->targetList)
 			{
-RangeTblRef *rtr = (RangeTblRef *) linitial(parsetree->jointree->fromlist);
+TargetEntry *tle = (TargetEntry *) lfirst(lc2);
 
-if (IsA(rtr, RangeTblRef))
+if (IsA(tle->expr, Var))
 {
-	RangeTblEntry *rte = rt_fetch(rtr->rtindex,
+	Var		   *var = (Var *) tle->expr;
+	RangeTblEntry *rte = rt_fetch(var->varno,
   parsetree->rtable);
 
 	if (rte->rtekind == RTE_VALUES)
 	{
 		values_rte = rte;
-		values_rte_index = rtr->rtindex;
+		values_rte_index = var->varno;
 	}
+	break;
 }
 			}
 
@@ -3837,7 +3845,11 @@ RewriteQuery(Query *parsetree, List *rew
 		break;
 	case CMD_UPDATE:
 	case CMD_INSERT:
-		/* XXX is it possible to have a VALUES clause? */
+
+		/*
+		 * MERGE actions do not permit multi-row INSERTs, so
+		 * there is no VALUES RTE to deal with here.
+		 */
 		action->targetList =
 			rewriteTargetListIU(action->targetList,
 action->commandType,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index 624d0e5..9c28dd9
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2938,11 +2938,11 @@ select pg_get_viewdef('shoe'::regclass,0
 --
 -- check multi-row VALUES in rules
 --
-create table rules_src(f1 int, f2 int);
-create table rules_log(f1 int, f2 int, tag text);
+create table rules_src(f1 int, f2 int default 0);
+create table rules_log(f1 int, f2 int, tag text, id serial);
 insert into rules_src values(1,2), (11,12);
 create rule r1 as on update to rules_src do also
-  insert into rules_log values(old.*, 'old'), (new.*, 'new');
+  insert into rules_log values(old.*, 'old', default), (new.*, 'new', default);
 update rules_src set f2 = f2 + 1;
 update rules_src set f2 = f2 * 10;
 select * from rules_src;
@@ -2953,16 +2953,16 @@ select * from rules_src;
 (2 rows)
 
 select * from rules_log;
- f1 | f2  | tag 
-+-+-
-  1 |   2

Re: Prefetch the next tuple's memory during seqscans

2022-11-23 Thread David Rowley

On Wed, 23 Nov 2022 at 21:26, sirisha chamarthi
 wrote:
> Master
> After vacuum:
> latency average = 393.880 ms
>
> Master + 0001 + 0005
> After vacuum:
> latency average = 369.591 ms

Thank you for running those again.  Those results make more sense.
Would you mind also testing the count(*) query too?

David

Re: New docs chapter on Transaction Management and related changes

2022-11-23 Thread Bruce Momjian

On Wed, Nov 23, 2022 at 02:17:19AM -0600, Justin Pryzby wrote:
> On Tue, Nov 22, 2022 at 01:50:36PM -0500, Bruce Momjian wrote:
> > +
> > +  
> > +   A more complex example with multiple nested subtransactions:
> > +
> > +BEGIN;
> > +INSERT INTO table1 VALUES (1);
> > +SAVEPOINT sp1;
> > +INSERT INTO table1 VALUES (2);
> > +SAVEPOINT sp2;
> > +INSERT INTO table1 VALUES (3);
> > +RELEASE SAVEPOINT sp2;
> > +INSERT INTO table1 VALUES (4))); -- generates an error
> > +
> > +   In this example, the application requests the release of the savepoint
> > +   sp2, which inserted 3.  This changes the insert's
> > +   transaction context to sp1.  When the statement
> > +   attempting to insert value 4 generates an error, the insertion of 2 and
> > +   4 are lost because they are in the same, now-rolled back savepoint,
> > +   and value 3 is in the same transaction context.  The application can
> > +   now only choose one of these two commands, since all other commands
> > +   will be ignored with a warning:
> > +
> > +   ROLLBACK;
> > +   ROLLBACK TO SAVEPOINT sp1;
> > +
> > +   Choosing ROLLBACK will abort everything, including
> > +   value 1, whereas ROLLBACK TO SAVEPOINT sp1 will 
> > retain
> > +   value 1 and allow the transaction to continue.
> > +  
> 
> This mentions a warning, but what happens is actually an error:
> 
> postgres=!# select;
> ERROR:  current transaction is aborted, commands ignored until end of 
> transaction block

Good point, new text:

   The application can now only choose one of these two commands,
   since all other commands will be ignored:

Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24b1624bad..01c65ede4c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7229,12 +7229,14 @@ local0.*/var/log/postgresql
 
 
  %v
- Virtual transaction ID (backendID/localXID)
+ Virtual transaction ID (backendID/localXID);  see
+ 
  no
 
 
  %x
- Transaction ID (0 if none is assigned)
+ Transaction ID (0 if none is assigned);  see
+ 
  no
 
 
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b030b36002..fdffba4442 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4992,7 +4992,8 @@ WHERE ...
 xmin and xmax.  Transaction identifiers are 32-bit quantities.
 In some contexts, a 64-bit variant xid8 is used.  Unlike
 xid values, xid8 values increase strictly
-monotonically and cannot be reused in the lifetime of a database cluster.
+monotonically and cannot be reused in the lifetime of a database
+cluster.  See  for more details.

 

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index de450cd661..0d6be9a2fa 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -104,6 +104,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 82fba48d5f..8affaf2a3d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24676,7 +24676,10 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns the current transaction's ID.  It will assign a new one if the
 current transaction does not have one already (because it has not
-performed any database updates).
+performed any database updates);  see  for details.  If executed in a
+subtransaction, this will return the top-level transaction ID;
+see  for details.

   
 
@@ -24693,6 +24696,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 ID is assigned yet.  (It's best to use this variant if the transaction
 might otherwise be read-only, to avoid unnecessary consumption of an
 XID.)
+If executed in a subtransaction, this will return the top-level
+transaction ID.

   
 
@@ -24736,6 +24741,9 @@ SELECT collation for ('foo' COLLATE "de_DE");

 Returns a current snapshot, a data structure
 showing which transaction IDs are now in-progress.
+Only top-level transaction IDs are included in the snapshot;
+subtransaction IDs are not shown;  see 
+for details.

   
 
@@ -24790,7 +24798,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
 Is the given transaction ID visible according
 to this snapshot (that is, was it completed before the snapshot was
 taken)?  Note that this function will not give the correct answer for
-a subtransaction ID.
+a subtransaction ID (subxid);  see  for
+details.

Re: fixing CREATEROLE

2022-11-23 Thread Mark Dilger




> On Nov 23, 2022, at 9:01 AM, Robert Haas  wrote:
> 
>> That's not to say that I wouldn't rather that it always work one way or 
>> always the other.  It's just to say that I don't want it to work differently 
>> based on some poorly advertised property of the role executing the command.
> 
> That seems rather pejorative. If we stipulate from the outset that the
> property is poorly advertised, then obviously anything at all
> depending on it is not going to seem like a very good idea. But why
> should we assume that it will be poorly-advertised? I clearly said
> that the documentation needs a bunch of work, and that I planned to
> work on it. As an initial matter, the documentation is where we
> advertise new features, so I think you ought to take it on faith that
> this will be well-advertised, unless you think that I'm completely
> hopeless at writing documentation or something.

Oh, I don't mean that it will be poorly documented.  I mean that the way the 
command is written won't advertise what it is going to do.  That's concerning 
if you fat-finger a CREATE ROLE command, then realize you need to drop and 
recreate the role, only to discover that a property you weren't thinking about, 
and which you are accustomed to being set the opposite way, is set such that 
you can't drop the role you just created.  I think if you're going to 
create-and-disown something, you should have to say so, to make sure you mean 
it.

This consideration differs from the default schema or default tablespace 
settings.  If I fat-finger the creation of a table, regardless of where it gets 
placed, I'm still the owner of the table, and I can still drop and recreate the 
table to fix my mistake.

Why not make this be a permissions issue, rather than a default behavior issue? 
 Instead of a single CREATEROLE privilege, grant roles privileges to 
CREATE-WITH-INHERIT, CREATE-WITH-ADMIN, CREATE-SANS-INHERIT, CREATE-SANS-ADMIN, 
and thereby limit which forms of the command they may execute.  That way, the 
semantics of the command do not depend on some property external to the 
command.  Users (and older scripts) will expect the traditional syntax to 
behave consistent with how CREATE  ROLE has worked in the past.  The behaviors 
can be specified explicitly.

> On the actual issue, I think that one key question is who should
> control what happens when a role is created. Is that the superuser's
> decision, or the CREATEROLE user's decision? I think it's better for
> it to be the superuser's decision. Consider first the use case where
> you want to set up a user who "feels like a superuser" i.e. inherits
> the privileges of users they create. You don't want them to have to
> specify anything when they create a role for that to happen. You just
> want it to happen. So you want to set up their account so that it will
> happen automatically, not throw the complexity back on them. In the
> reverse scenario where you don't want the privileges inherited, I
> think it's a little less clear, possibly just because I haven't
> thought about that scenario as much, but I think it's very reasonable
> here too to want the superuser to set up a configuration for the
> CREATEROLE user that does what the superuser wants, rather than what
> the CREATEROLE user wants.
> 
> Even aside from the question of who controls what, I think it is far
> better from a usability perspective to have ways of setting up good
> defaults. That is why we have the default_tablespace GUC, for example.
> We could have made the CREATE TABLE command always use the database's
> default tablespace, or we could have made it always use the main
> tablespace. Then it would not be dependent on (poorly advertised?)
> settings elsewhere. But it would also be really inconvenient, because
> if someone is creating a lot of tables and wants them all to end up in
> the same place, they don't want to have to specify the name of that
> tablespace each time. They want to set a default and have that get
> used by each command.
> 
> There's another, subtler consideration here, too. Since
> ce6b672e4455820a0348214be0da1a024c3f619f, there are constraints on who
> can validly be recorded as the grantor of a particular role grant,
> just as we have always done for other types of grants. The grants have
> to form a tree, with each grant having a grantor that was themselves
> granted ADMIN OPTION by someone else, until eventually you get back to
> the bootstrap superuser who is the source of all privileges. Thus,
> today, when a CREATEROLE user grants membership in a role, the grantor
> is recorded as the bootstrap superuser, because they might not
> actually possess ADMIN OPTION on the role at all, and so we can't
> necessarily record them as the grantor. But this patch changes that,
> which I think is a significant improvement. The implicit grant that is
> created when CREATE ROLE is issued has the bootstrap superuser as
> grantor, because there is no other legal option, but then any further

Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency

2022-11-23 Thread Peter Geoghegan

On Wed, Nov 23, 2022 at 2:54 AM Alvaro Herrera  wrote:
> Something like the attached.  It would result in output like this:
> WARNING:  new multixact has more than one updating member: 0 2[17378 (keysh), 
> 17381 (nokeyupd)]
>
> Then it should be possible to trace (in pg_waldump output) the
> operations of each of the transactions that have any status in the
> multixact that includes some form of "upd".

That seems very useful.

Separately, I wonder if it would make sense to add additional
defensive checks to FreezeMultiXactId() for this. There is an
assertion that should catch the presence of multiple updaters in a
single Multi when it looks like we have to generate a new Multi to
carry the XID members forward (typically something we only need to do
during a VACUUM FREEZE). We could at least make that
"Assert(!TransactionIdIsValid(update_xid));" line into a defensive
"can't happen" ereport(). It couldn't hurt, at least -- we already
have a similar relfrozenxid check nearby, added after the "freeze the
dead" bug was fixed.

-- 
Peter Geoghegan

Re: Prefetch the next tuple's memory during seqscans

2022-11-23 Thread Bruce Momjian

On Wed, Nov 23, 2022 at 11:03:22AM -0500, Bruce Momjian wrote:
> > CPUs have several different kinds of 'hardware prefetchers' (worth
> > reading about), that look out for sequential and striding patterns and
> > try to get the cache line ready before you access it.  Using the
> > prefetch instructions explicitly is called 'software prefetching'
> > (special instructions inserted by programmers or compilers).  The
> > theory here would have to be that the hardware prefetchers couldn't
> > pick up the pattern, but we know how to do it.  The exact details of
> > the hardware prefetchers vary between chips, and there are even some
> > parameters you can adjust in BIOS settings.  One idea is that the
> > hardware prefetchers are generally biased towards increasing
> > addresses, but our tuples tend to go backwards on the page[1].  It's
> > possible that some other CPUs can detect backwards strides better, but
> > since real world tuples aren't of equal size anyway, there isn't
> > really a fixed stride at all, so software prefetching seems quite
> > promising for this...
> > 
> > [1] 
> > https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-PAGE-LAYOUT-FIGURE
> 
> I remember someone showing that having our item pointers at the _end_ of
> the page and tuples at the start moving toward the end increased
> performance significantly.

Ah, I found it, from 2017, with a 15-25% slowdown:


https://www.postgresql.org/message-id/20171108205943.tps27i2tujsstrg7%40alap3.anarazel.de

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: Another multi-row VALUES bug

2022-11-23 Thread Tom Lane

Dean Rasheed  writes:
> In RewriteQuery(), the code assumes that in a multi-row INSERT query,
> the VALUES RTE will be the only thing in the query's fromlist. That's
> true for the original query, but it's not necessarily the case for
> product queries, if the rule action performs a multi-row insert,
> leading to a new VALUES RTE that the DEFAULT-handling code might fail
> to process. For example:

> CREATE TABLE foo(a int);
> INSERT INTO foo VALUES (1);

> CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text);
> CREATE RULE foo_r AS ON UPDATE TO foo
>   DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'),
>  (DEFAULT, new.a, 'new');

> UPDATE foo SET a = 2 WHERE a = 1;

> ERROR:  unrecognized node type: 43

Ugh.

> So I think what the code needs to do is examine the targetlist, and
> identify the VALUES RTE that the current query is using as a source,
> and rewrite just that RTE (so any original VALUES RTE is rewritten at
> the top level, and any VALUES RTEs from rule actions are rewritten
> while recursing, and none are rewritten more than once), as in the
> attached patch.

Hmm ... this patch does not feel any more principled or future-proof
than what it replaces, because now instead of making assumptions
about what's in the jointree, you're making assumptions about what's
in the targetlist.  I wonder if there is some other way to identify
the target VALUES RTE.

Looking at the parsetree in gdb, I see that in this example the
VALUES RTE is still the first entry in the fromlist, it's just not
the only one there.  So I wonder whether it'd be sufficient to do

-if (list_length(parsetree->jointree->fromlist) == 1)
+if (parsetree->jointree->fromlist != NIL)

I'm not 100% sure that product-query rewriting would always produce
a FROM-list in this order, but I think it might be true.

Another idea is to identify the VALUES RTE before we start rewriting,
and pass that information on.  That should be pretty bulletproof,
but of course more invasive.

Or ... maybe we should perform this particular step before we build
product queries?  Just because we stuck it into QueryRewrite
originally doesn't mean that's the right place.

regards, tom lane

Re: Non-decimal integer literals

2022-11-23 Thread Dean Rasheed

On Tue, 22 Nov 2022 at 13:37, Peter Eisentraut
 wrote:
>
> On 15.11.22 11:31, Peter Eisentraut wrote:
> > On 14.11.22 08:25, John Naylor wrote:
> >> Regarding the patch, it looks good overall. My only suggestion would
> >> be to add a regression test for just below and just above overflow, at
> >> least for int2.
> >
> This was a valuable suggestion, because this found some breakage.  In
> particular, the handling of grammar-level literals that overflow to
> "Float" was not correct.  (The radix prefix was simply stripped and
> forgotten.)  So I added a bunch more tests for this.  Here is a new patch.

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:

SELECT -0x8000;

ERROR:  invalid input syntax for type numeric: "-0x8000"
   ^
Regards,
Dean

Bug in MERGE's test for tables with rules

2022-11-23 Thread Dean Rasheed

While playing around with rules and MERGE, I noticed that there is a
bug in the way that it detects whether the target table has rules ---
it uses rd_rel->relhasrules, which can be incorrect, since it might be
set for a table that doesn't currently have rules, but did in the
recent past.

So it actually needs to examine rd_rules. Technically, I think that it
would be sufficient to just test whether rd_rules is non-NULL, but I
think it's more robust and readable to check rd_rules->numLocks, as in
the attached patch.

Regards,
Dean
diff --git a/src/backend/parser/parse_merge.c b/src/backend/parser/parse_merge.c
new file mode 100644
index 7913523..62c2ff6
--- a/src/backend/parser/parse_merge.c
+++ b/src/backend/parser/parse_merge.c
@@ -182,7 +182,8 @@ transformMergeStmt(ParseState *pstate, M
  errmsg("cannot execute MERGE on relation \"%s\"",
 		RelationGetRelationName(pstate->p_target_relation)),
  errdetail_relkind_not_supported(pstate->p_target_relation->rd_rel->relkind)));
-	if (pstate->p_target_relation->rd_rel->relhasrules)
+	if (pstate->p_target_relation->rd_rules != NULL &&
+		pstate->p_target_relation->rd_rules->numLocks > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot execute MERGE on relation \"%s\"",
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index 624d0e5..ad1a8c9
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -3532,6 +3532,30 @@ MERGE INTO rule_merge2 t USING (SELECT 1
 		DELETE
 	WHEN NOT MATCHED THEN
 		INSERT VALUES (s.a, '');
+SELECT * FROM rule_merge2;
+ a | b 
+---+---
+ 1 | 
+(1 row)
+
+-- and should be ok after dropping rules
+DROP RULE rule1 ON rule_merge1;
+DROP RULE rule2 ON rule_merge1;
+DROP RULE rule3 ON rule_merge1;
+MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s
+	ON t.a = s.a
+	WHEN MATCHED AND t.a < 2 THEN
+		UPDATE SET b = b || ' updated by merge, had rules'
+	WHEN MATCHED AND t.a > 2 THEN
+		DELETE
+	WHEN NOT MATCHED THEN
+		INSERT VALUES (s.a, 'had rules');
+SELECT * FROM rule_merge1;
+ a | b 
+---+---
+ 1 | had rules
+(1 row)
+
 --
 -- Test enabling/disabling
 --
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
new file mode 100644
index bfb5f3b..8a728be
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -1264,6 +1264,21 @@ MERGE INTO rule_merge2 t USING (SELECT 1
 		DELETE
 	WHEN NOT MATCHED THEN
 		INSERT VALUES (s.a, '');
+SELECT * FROM rule_merge2;
+
+-- and should be ok after dropping rules
+DROP RULE rule1 ON rule_merge1;
+DROP RULE rule2 ON rule_merge1;
+DROP RULE rule3 ON rule_merge1;
+MERGE INTO rule_merge1 t USING (SELECT 1 AS a) s
+	ON t.a = s.a
+	WHEN MATCHED AND t.a < 2 THEN
+		UPDATE SET b = b || ' updated by merge, had rules'
+	WHEN MATCHED AND t.a > 2 THEN
+		DELETE
+	WHEN NOT MATCHED THEN
+		INSERT VALUES (s.a, 'had rules');
+SELECT * FROM rule_merge1;
 
 --
 -- Test enabling/disabling

Re: [PATCH] Const'ify the arguments of ilist.c/ilist.h functions

2022-11-23 Thread Aleksander Alekseev

Hi Andres,

Thanks for the review!

> I don't think it is correct for any of these to add const. The only reason it
> works is because of casting etc.

Fair enough. PFA the corrected patch v2.

-- 
Best regards,
Aleksander Alekseev


v2-0001-Constify-the-arguments-of-ilist.c-h-functions.patch
Description: Binary data

Re: drop postmaster symlink

2022-11-23 Thread Devrim Gündüz

Hi,

On Wed, 2022-11-23 at 09:18 -0500, Joe Conway wrote:
> I am a big +1 on removing the symlink, however it is worth pointing
> out 
> that the PGDG RPMs still use the symlink in the included systemd
> service 
> file:
> 
> 8<--
> ExecStart=/usr/pgsql-15/bin/postmaster -D ${PGDATA}


...and it helps us to find the "main" process a bit easily.

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part

Re: drop postmaster symlink

2022-11-23 Thread Tom Lane

Devrim =?ISO-8859-1?Q?G=FCnd=FCz?=  writes:
> ...and it helps us to find the "main" process a bit easily.

Hmm, that's a nontrivial point perhaps.  It's certain that this
will break some other people's start scripts too.  On the whole,
is it really that hard to add the symlink to the meson build?

regards, tom lane

Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

2022-11-23 Thread Greg Stark

On Mon, 21 Nov 2022 at 15:01, Andres Freund  wrote:
>
> It's somewhat sad to add this restriction - I've used get_raw_page() (+
> other functions) to scan a whole database for a bug. IIRC that actually
> did end up using parallelism, albeit likely not very efficiently.
>
> Don't really have a better idea though.

Given how specific the use case is here a simple solution would be to
just have a dedicated get_raw_temp_page() and restrict get_raw_page()
to persistent tables.

I suppose slightly gilding it would be to make a get_raw_page_temp()
and get_raw_page_persistent() and then you could have get_raw_page()
call the appropropriate one. They would be parallel restricted except
for get_raw_page_persistent() and if you explicitly called it you
could get parallel scans otherwise you wouldn't.

-- 
greg

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley

On Fri, 18 Nov 2022 at 03:34, Tomas Vondra
 wrote:
> I did some simple benchmark with v2 and v3, using the attached script,
> which essentially just builds hash index on random data, with different
> data types and maintenance_work_mem values. And what I see is this
> (median of 10 runs):

> So to me it seems v2 performs demonstrably better, v3 is consistently
> slower - not only compared to v2, but often also to master.

Could this just be down to code alignment changes?  There does not
really seem to be any fundamental differences which would explain
this.

David

Re: Bug in MERGE's test for tables with rules

2022-11-23 Thread Tom Lane

Dean Rasheed  writes:
> While playing around with rules and MERGE, I noticed that there is a
> bug in the way that it detects whether the target table has rules ---
> it uses rd_rel->relhasrules, which can be incorrect, since it might be
> set for a table that doesn't currently have rules, but did in the
> recent past.

> So it actually needs to examine rd_rules. Technically, I think that it
> would be sufficient to just test whether rd_rules is non-NULL, but I
> think it's more robust and readable to check rd_rules->numLocks, as in
> the attached patch.

+1 for the code change.  Not quite sure the added test case is worth
the cycles.

regards, tom lane

Re: fixing CREATEROLE

2022-11-23 Thread Robert Haas

On Tue, Nov 22, 2022 at 5:48 PM Mark Dilger
 wrote:
> Whatever behavior is to happen in the CREATE ROLE statement should be spelled 
> out in that statement.  "CREATE ROLE bob WITH INHERIT false WITH SET false" 
> doesn't seem too unwieldy, and has the merit that it can be read and 
> understood without reference to hidden parameters.  Forcing this to be 
> explicit should be safer if these statements ultimately make their way into 
> dump/restore scripts, or into logical replication.
>
> That's not to say that I wouldn't rather that it always work one way or 
> always the other.  It's just to say that I don't want it to work differently 
> based on some poorly advertised property of the role executing the command.

That seems rather pejorative. If we stipulate from the outset that the
property is poorly advertised, then obviously anything at all
depending on it is not going to seem like a very good idea. But why
should we assume that it will be poorly-advertised? I clearly said
that the documentation needs a bunch of work, and that I planned to
work on it. As an initial matter, the documentation is where we
advertise new features, so I think you ought to take it on faith that
this will be well-advertised, unless you think that I'm completely
hopeless at writing documentation or something.

On the actual issue, I think that one key question is who should
control what happens when a role is created. Is that the superuser's
decision, or the CREATEROLE user's decision? I think it's better for
it to be the superuser's decision. Consider first the use case where
you want to set up a user who "feels like a superuser" i.e. inherits
the privileges of users they create. You don't want them to have to
specify anything when they create a role for that to happen. You just
want it to happen. So you want to set up their account so that it will
happen automatically, not throw the complexity back on them. In the
reverse scenario where you don't want the privileges inherited, I
think it's a little less clear, possibly just because I haven't
thought about that scenario as much, but I think it's very reasonable
here too to want the superuser to set up a configuration for the
CREATEROLE user that does what the superuser wants, rather than what
the CREATEROLE user wants.

Even aside from the question of who controls what, I think it is far
better from a usability perspective to have ways of setting up good
defaults. That is why we have the default_tablespace GUC, for example.
We could have made the CREATE TABLE command always use the database's
default tablespace, or we could have made it always use the main
tablespace. Then it would not be dependent on (poorly advertised?)
settings elsewhere. But it would also be really inconvenient, because
if someone is creating a lot of tables and wants them all to end up in
the same place, they don't want to have to specify the name of that
tablespace each time. They want to set a default and have that get
used by each command.

There's another, subtler consideration here, too. Since
ce6b672e4455820a0348214be0da1a024c3f619f, there are constraints on who
can validly be recorded as the grantor of a particular role grant,
just as we have always done for other types of grants. The grants have
to form a tree, with each grant having a grantor that was themselves
granted ADMIN OPTION by someone else, until eventually you get back to
the bootstrap superuser who is the source of all privileges. Thus,
today, when a CREATEROLE user grants membership in a role, the grantor
is recorded as the bootstrap superuser, because they might not
actually possess ADMIN OPTION on the role at all, and so we can't
necessarily record them as the grantor. But this patch changes that,
which I think is a significant improvement. The implicit grant that is
created when CREATE ROLE is issued has the bootstrap superuser as
grantor, because there is no other legal option, but then any further
grants performed by the CREATE ROLE user rely on that user having that
grant, and thus record the OID of the CREATEROLE user as the grantor,
not the bootstrap superuser.

That, in turn, has a number of rather nice consequences. It means in
particular that the CREATEROLE user can't remove the implicit grant,
nor can they alter it. They are, after all, not the grantor, who is
the bootstrap superuser, nor do they any longer have the authority to
act as the bootstrap superuser. Thus, if you have two or more
CREATEROLE users running around doing stuff, you can tell who did
what. Every role that those users created is linked back to the
creating role in a way that the creator can't alter. A CREATEROLE user
is unable to contrive a situation where they no longer control a role
that they created. That also means that the superuser, if desired, can
revoke all membership grants performed by any particular CREATEROLE
user by revoking the implicit grants with CASCADE.

But since this implicit grant has, and must have, the bootstrap
superuser as

code cleanups

2022-11-23 Thread Justin Pryzby

Some modest cleanups I've accumulated.
>From 9c5846cc3f04c6797696a234fac0953815e09e4b Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 23 Oct 2022 14:52:48 -0500
Subject: [PATCH 1/6] WIP: do not loop to set an array to false

See also:
9fd45870c1436b477264c0c82eb195df52bc0919
572e3e6634e55accf95e2bcfb1340019c86a21dc
---
 src/backend/access/transam/xlogprefetcher.c |  5 +---
 src/backend/catalog/pg_constraint.c | 11 ++--
 src/backend/commands/user.c |  6 +
 src/backend/statistics/extended_stats.c |  6 +
 src/backend/storage/smgr/md.c   |  3 +--
 src/backend/tcop/pquery.c   |  3 +--
 src/backend/utils/adt/arrayfuncs.c  | 29 -
 src/test/isolation/isolationtester.c|  3 +--
 8 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/transam/xlogprefetcher.c b/src/backend/access/transam/xlogprefetcher.c
index 0cf03945eec..7fcfd146525 100644
--- a/src/backend/access/transam/xlogprefetcher.c
+++ b/src/backend/access/transam/xlogprefetcher.c
@@ -832,13 +832,10 @@ pg_stat_get_recovery_prefetch(PG_FUNCTION_ARGS)
 #define PG_STAT_GET_RECOVERY_PREFETCH_COLS 10
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 	Datum		values[PG_STAT_GET_RECOVERY_PREFETCH_COLS];
-	bool		nulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS];
+	bool		nulls[PG_STAT_GET_RECOVERY_PREFETCH_COLS] = {0};
 
 	InitMaterializedSRF(fcinfo, 0);
 
-	for (int i = 0; i < PG_STAT_GET_RECOVERY_PREFETCH_COLS; ++i)
-		nulls[i] = false;
-
 	values[0] = TimestampTzGetDatum(pg_atomic_read_u64(>reset_time));
 	values[1] = Int64GetDatum(pg_atomic_read_u64(>prefetch));
 	values[2] = Int64GetDatum(pg_atomic_read_u64(>hit));
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 3a5d4e96439..bd8f32bf62d 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -82,8 +82,8 @@ CreateConstraintEntry(const char *constraintName,
 	Relation	conDesc;
 	Oid			conOid;
 	HeapTuple	tup;
-	bool		nulls[Natts_pg_constraint];
-	Datum		values[Natts_pg_constraint];
+	bool		nulls[Natts_pg_constraint] = {0};
+	Datum		values[Natts_pg_constraint] = {0};
 	ArrayType  *conkeyArray;
 	ArrayType  *confkeyArray;
 	ArrayType  *conpfeqopArray;
@@ -165,13 +165,6 @@ CreateConstraintEntry(const char *constraintName,
 	else
 		conexclopArray = NULL;
 
-	/* initialize nulls and values */
-	for (i = 0; i < Natts_pg_constraint; i++)
-	{
-		nulls[i] = false;
-		values[i] = (Datum) NULL;
-	}
-
 	conOid = GetNewOidWithIndex(conDesc, ConstraintOidIndexId,
 Anum_pg_constraint_oid);
 	values[Anum_pg_constraint_oid - 1] = ObjectIdGetDatum(conOid);
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 8b6543edee2..5af79792136 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1234,8 +1234,7 @@ RenameRole(const char *oldname, const char *newname)
 	bool		isnull;
 	Datum		repl_val[Natts_pg_authid];
 	bool		repl_null[Natts_pg_authid];
-	bool		repl_repl[Natts_pg_authid];
-	int			i;
+	bool		repl_repl[Natts_pg_authid] = {0};
 	Oid			roleid;
 	ObjectAddress address;
 	Form_pg_authid authform;
@@ -1321,9 +1320,6 @@ RenameRole(const char *oldname, const char *newname)
 	}
 
 	/* OK, construct the modified tuple */
-	for (i = 0; i < Natts_pg_authid; i++)
-		repl_repl[i] = false;
-
 	repl_repl[Anum_pg_authid_rolname - 1] = true;
 	repl_val[Anum_pg_authid_rolname - 1] = DirectFunctionCall1(namein,
 			   CStringGetDatum(newname));
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ab97e71dd79..93a9d3eeafb 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2343,7 +2343,7 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		VacAttrStats *stats = exprdata[exprno].vacattrstat;
 
 		Datum		values[Natts_pg_statistic];
-		bool		nulls[Natts_pg_statistic];
+		bool		nulls[Natts_pg_statistic] = {0};
 		HeapTuple	stup;
 
 		if (!stats->stats_valid)
@@ -2359,10 +2359,6 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		/*
 		 * Construct a new pg_statistic tuple
 		 */
-		for (i = 0; i < Natts_pg_statistic; ++i)
-		{
-			nulls[i] = false;
-		}
 
 		values[Anum_pg_statistic_starelid - 1] = ObjectIdGetDatum(InvalidOid);
 		values[Anum_pg_statistic_staattnum - 1] = Int16GetDatum(InvalidAttrNumber);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 14b6fa0fd90..2d5cc97eaa9 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -557,8 +557,7 @@ void
 mdopen(SMgrRelation reln)
 {
 	/* mark it not open */
-	for (int forknum = 0; forknum <= MAX_FORKNUM; forknum++)
-		reln->md_num_open_segs[forknum] = 0;
+	memset(reln->md_num_open_segs, 0, (1+MAX_FORKNUM) * sizeof(*reln->md_num_open_segs));
 }
 
 /*
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Simon Riggs

On Wed, 23 Nov 2022 at 13:04, David Rowley  wrote:

> After getting rid of the  HashInsertState code and just adding bool
> sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
> is much more simple:

Seems good to me and I wouldn't argue with any of your comments.

> and v4 includes 7 extra lines in hashinsert.c for the Assert() I
> mentioned in my previous email plus a bunch of extra comments.

Oh, I did already include that in v3 as requested.

> I'd rather see this solved like v4 is doing it.

Please do. No further comments. Thanks for your help

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Decouple last important WAL record LSN from WAL insert locks

2022-11-23 Thread Bharath Rupireddy

Hi,

While working on something else, I noticed that each WAL insert lock
tracks its own last important WAL record's LSN (lastImportantAt) and
both the bgwriter and checkpointer later computes the max
value/server-wide last important WAL record's LSN via
GetLastImportantRecPtr(). While doing so, each WAL insertion lock is
acquired in exclusive mode in a for loop. This seems like too much
overhead to me. I quickly coded a patch (attached herewith) that
tracks the server-wide last important WAL record's LSN in
XLogCtlInsert (lastImportantPos) protected with a spinlock and gets
rid of lastImportantAt from each WAL insert lock. I ran pgbench with a
simple insert [1] and the results are below. While the test was run,
the GetLastImportantRecPtr() was called 4-5 times.

# of clientsHEADPATCHED
18382
2159157
4303302
8576570
1611041095
3220552041
6422862295
12822702285
25623022253
51222052290
76822242180
102421092150
204819411936
409618561848

It doesn't seem to hurt (for this use-case) anyone, however there
might be some benefit if bgwriter and checkpointer come in the way of
WAL inserters. With the patch, the extra exclusive lock burden on WAL
insert locks is gone. Since the amount of work the WAL inserters do
under the new spinlock is very minimal (updating
XLogCtlInsert->lastImportantPos), it may not be an issue. Also, it's
worthwhile to look at the existing comment [2], which doesn't talk
about the performance impact of having a lock.

Thoughts?

[1]
./configure --prefix=$PWD/inst/ CFLAGS="-O3" > install.log && make -j
8 install > install.log 2>&1 &
cd inst/bin
./pg_ctl -D data -l logfile stop
rm -rf data logfile insert.sql
free -m
sudo su -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
free -m
./initdb -D data
./pg_ctl -D data -l logfile start
./psql -d postgres -c 'ALTER SYSTEM SET shared_buffers = "8GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_wal_size = "32GB";'
./psql -d postgres -c 'ALTER SYSTEM SET max_connections = "4096";'
./psql -d postgres -c 'ALTER SYSTEM SET bgwriter_delay = "10ms";'
./pg_ctl -D data -l logfile restart
./pgbench -i -s 1 -d postgres
./psql -d postgres -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT
pgbench_accounts_pkey;"
cat << EOF >> insert.sql
\set aid random(1, 10 * :scale)
\set delta random(1, 10 * :scale)
INSERT INTO pgbench_accounts (aid, bid, abalance) VALUES (:aid, :aid, :delta);
EOF
for c in 1 2 4 8 16 32 64 128 256 512 768 1024 2048 4096; do echo -n
"$c ";./pgbench -n -M prepared -U ubuntu postgres -b simple-update
-c$c -j$c -T5 2>&1|grep '^tps'|awk '{print $3}';done

[2]
 * records.  Tracking the WAL activity directly in WALInsertLock has the
 * advantage of not needing any additional locks to update the value.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Decouple-last-important-WAL-record-LSN-from-WAL-i.patch
Description: Binary data

Re: New docs chapter on Transaction Management and related changes

2022-11-23 Thread Bruce Momjian

On Wed, Nov 23, 2022 at 08:57:33AM +0100, Laurenz Albe wrote:
> On Tue, 2022-11-22 at 13:50 -0500, Bruce Momjian wrote:
> > Agreed, updated patch attached.
> 
> I cannot find any more problems, and I shouldn't mention the extra empty
> line at the end of the patch.

Fixed.  ;-)

> I'd change the commitfest status to "ready for committer" now if it were
> not already in that status.

I knew we would eventually get here.  The feedback has been very helpful
and I am excited about the content.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: drop postmaster symlink

2022-11-23 Thread Joe Conway


On 11/23/22 02:52, Peter Eisentraut wrote:

A little while ago we discussed briefly over in the meson thread whether
we could remove the postmaster symlink [0].  The meson build system
currently does not install a postmaster symlink.  (AFAICT, the MSVC
build system does not either.)  So if we want to elevate the meson build
system, we either need to add the postmaster symlink, or remove it from
the other build system(s) as well.  Seeing that it's been deprecated for
a long time, I propose we just remove it.  See attached patches.


I am a big +1 on removing the symlink, however it is worth pointing out 
that the PGDG RPMs still use the symlink in the included systemd service 
file:


8<--
ExecStart=/usr/pgsql-15/bin/postmaster -D ${PGDATA}
8<--

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

cleanup in open_auth_file

2022-11-23 Thread Ted Yu

Hi,
I was looking at the following commit:

commit efc981627a723d91e86865fb363d793282e473d1
Author: Michael Paquier 
Date:   Thu Nov 24 08:21:55 2022 +0900

Rework memory contexts in charge of HBA/ident tokenization

I think when the file cannot be opened, the context should be deleted.

Please see attached patch.

I also modified one comment where `deleted` would be more appropriate verb
for the context.

Cheers


open-auth-file-cleanup.patch
Description: Binary data

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Thomas Munro

On Thu, Nov 24, 2022 at 11:05 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Wed, Nov 23, 2022 at 11:03 PM Thomas Munro  
> > wrote:
> > As for what to do about it, some ideas:
> > 2.  Retry after a short time on checksum failure.  The probability is
> > already miniscule, and becomes pretty close to 0 if we read thrice
> > 100ms apart.
>
> > First thought is that 2 is appropriate level of complexity for this
> > rare and stupid problem.
>
> Yeah, I was thinking the same.  A variant could be "repeat until
> we see the same calculated checksum twice".

Hmm.  While writing a comment to explain why that's good enough, I
realised it's not really true for a standby that control file writes
are always expected to be far apart in time.  XLogFlush->
UpdateMinRecoveryPoint() could coincide badly with our N attempts for
any small N and for any nap time, which I think makes your idea better
than mine.

With some cartoon-level understanding of what's going on (to wit: I
think the kernel just pins the page but doesn't use a page-level
content lock or range lock, so what you're seeing is raw racing memcpy
calls and unsynchronised cache line effects), I guess you'd be fairly
likely to make "progress" in seeing more new data even if you didn't
sleep in between, but who knows.  So I have a 10ms sleep to make
progress very likely; given your algorithm it doesn't matter if you
didn't make all the progress, just some.  Since this is reachable from
SQL, I think we also need a CFI call so you can't get uninterruptibly
stuck here?

I wrote a stupid throw-away function to force a write.  If you have an
ext4 system to hand (xfs, zfs, apfs, ufs, others don't suffer from
this) you can do:

 do $$ begin for i in 1..1 do loop perform
pg_update_control_file(); end loop; end; $$;

... while you also do:

 select pg_control_system();
 \watch 0.001

... and you'll soon see:

ERROR:  calculated CRC checksum does not match value stored in file

The attached draft patch fixes it.
From 9f1856aeb56be888123702fe471d8388da66439f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 24 Nov 2022 00:55:03 +
Subject: [PATCH 1/2] XXX Dirty hack to clobber control file for testing

---
 src/backend/access/transam/xlog.c | 10 ++
 src/include/catalog/pg_proc.dat   |  8 
 2 files changed, 18 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a31fbbff78..88de05ab35 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2502,6 +2502,16 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
 	LWLockRelease(ControlFileLock);
 }

+Datum
+pg_update_control_file()
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->minRecoveryPoint++;		/* XXX change something to affect CRC! */
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	PG_RETURN_VOID();
+}
+
 /*
  * Ensure that all XLOG data through the given position is flushed to disk.
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f15aa2dbb1..8177c1657c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11679,6 +11679,14 @@
   proargnames => '{pg_control_version,catalog_version_no,system_identifier,pg_control_last_modified}',
   prosrc => 'pg_control_system' },

+{ oid => '',
+  descr => 'update control file',
+  proname => 'pg_update_control_file', provolatile => 'v', prorettype => 'void',
+  proargtypes => '', proallargtypes => '',
+  proargmodes => '{}',
+  proargnames => '{}',
+  prosrc => 'pg_update_control_file' },
+
 { oid => '3442',
   descr => 'pg_controldata checkpoint state information as a function',
   proname => 'pg_control_checkpoint', provolatile => 'v',
-- 
2.35.1

From a63818a32d661dba563cedfdb85731e522b3c6a9 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 24 Nov 2022 13:28:22 +1300
Subject: [PATCH 2/2] Try to tolerate concurrent reads and writes of control
 file.

Various frontend programs and SQL-callable backend functions read the
control file without any kind of interlocking against concurrent writes.
Linux ext4 doesn't implement the atomicity required by POSIX here, so a
concurrent reader can see only partial effects of an in-progress write.

Tolerate this by retrying until we get two reads in a row with the same
checksum, after an idea from Tom Lane.

Reported-by: Andres Freund 
Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
---
 src/common/controldata_utils.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 2d1f35bbd1..200d24df02 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -56,12 +56,19 @@ get_controlfile(const char *DataDir, bool *crc_ok_p)
 	char		ControlFilePath[MAXPGPATH];
 	pg_crc32c	crc;
 	int			r;
+	bool		first_try;
+	pg_crc32c	last_crc;

 	Assert(crc_ok_p);

 	ControlFile =

Re: Collation version tracking for macOS

2022-11-23 Thread Jeff Davis

On Wed, 2022-11-23 at 18:08 +1300, Thomas Munro wrote:

> (1) the default behaviour on failure to search would
> likely be to use the linked library instead and WARN about
> [dat]collversion mismatch, so far the same, and 

Agreed.

> (2) the set of people
> who would really be prepared to compile their own copy of 67.X
> instead
> of downgrading or REFRESHing (with or without rebuilding) is
> vanishingly small.

The set of people prepared to do so is probably small. But the set of
people who will do it (prepared or not) when a problem comes up is
significantly larger ;-)

> 1.  *Do* they change ucol_getVersion() values in minor releases?  I
> tried to find a written policy on that.

It seems like a valid concern. The mere existence of a collation
version separate from the library major version seems to suggest that
it's possible. Perhaps they avoid it in most cases; but absent a
specific policy against it, the separate collation version seems to
allow them the freedom to do so.

> This speculation feels pretty useless.  Maybe we should go and read
> the code or ask an ICU expert, but I'm not against making it
> theoretically possible to access two different minor versions at
> once,
> just to cover all the bases for future-proofing.

I don't think this should be an overriding concern that drives the
whole design. It is a nudge in favor of search-by-collversion.

> 2.  Would package managers ever allow two minor versions to be
> installed at once?  I highly doubt it; 

Agreed.

I'm sure this has been discussed, but which distros even support
multiple major versions of ICU?

> 
> 1.  search-by-collversion:  We introduce no new "library version"
> concept to COLLATION and DATABASE object and little or no new syntax.
> Whenever opening a collation or database, the system will search some
> candidate list of ICU libraries to try to find the one that agrees
> with [dat]collversion.

[...]

> The reason I prefer major[.minor] strings over whole library names is
> that we need to dlopen two of them so it's a little easier to build
> them from those parts than have to supply both names.

It also makes it easier to know which version suffixes to look for.

>   The reason I
> prefer to keep allowing major-only versions to be listed is that it's
> good to have the option to just follow minor upgrades automatically.

Makes sense.

> Or I guess you could make something that can automatically search a
> whole directory (which directory?) to find all the suitably named
> libraries so you don't ever have to mention versions manually (if you
> want "apt-get install libicu72" to be enough with no GUC change
> needed) -- is that too weird?

That seems to go a little too far.

>   SELECT * FROM pg_available_icu_libraries()
>   SELECT * FROM pg_available_icu_collation_versions('en')

+1

> 2.  lib-version-in-providers: We introduce a separate provider value
> for each ICU version, for example ICU63, plus an unversioned ICU like
> today.

I expressed interest in this approach before, but when you allowed ICU
compiled with --disable-renaming, that mitigated my concerns about when
to throw that error.

> 3.  lib-version-in-attributes: We introduce daticuversion (alongside
> datcollversion) and collicuversion (alongside collversion).

I think this is the best among 2-4.

> 4.  lib-version-in-locale:  "63:en" from earlier versions.  That was
> mostly a strawman proposal to avoid getting bogged down in
> syntax/catalogue/model change discussions while trying to prove that
> dlopen would even work.  It doesn't sound like anyone really likes
> this.

I don't see any advantage of this over 3.

> 5.  lib-version-in-collversion:  We didn't explicitly discuss this
> before, but you hinted at it: we could just use u_getVersion() in
> [dat]collversion.

The advantage here is that it's very easy to tell the admin what
library the collation is looking for, but the disadvantages you point
out seem a lot worse: migration problems from v15, and the minor
version question.

I'd vote for 1 on the grounds that it's easier to document and
understand a single collation version, which comes straight from
ucol_getVersion(). This approach makes it a separate problem to find
the collation version among whatever libraries the admin can provide;
but adding some observability into the search should mitigate any
confusion.

Can you go over the advantages of approaches 2-4 again? Is it just a
concern about burdening the admin with finding the right ICU library
version for a given collation version? That's a valid concern, but I
don't think that should be an overriding design point. It seems more
important to model the collation versions properly.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-23 Thread Michael Paquier

On Wed, Nov 23, 2022 at 03:56:50PM +0800, Julien Rouhaud wrote:
> The depth 0 is getting used quite a lot now, maybe we should have a define for
> it to make it easier to grep, like TOP_LEVEL_AUTH_FILE or something like that?
> And also add a define for the magical 10 for the max inclusion depth, for both
> auth files and GUC files while at it?

Sounds like a good idea to me, and it seems to me that this had better
be unified between the GUCs (see ParseConfigFp() that hardcodes a
depth of 0) and hba.c.  It looks like they could be added to
conffiles.h, as of CONF_FILE_START_{LEVEL,DEPTH} and
CONF_FILE_MAX_{LEVEL,DEPTH}.  Would you like to send a patch?

> The comment seems a bit ambiguous as with "loading the top..." you probably
> meant something like "loading the file in memory" rather than "(re)loading the
> configuration".  Maybe s/loading/opening/?

Right.  I have used "opening" at the end.

I have worked on all that, did more polishing to the docs, the
comments and some tiny bits of the logic, and applied both 0001 and
0002.  Now, to the tests..

> Mmm, I haven't looked deeply so I'm not sure if the perl podules are aware of
> it or not, but maybe we could somehow detect the used delimiter at the
> beginning after normalizing the directory, and use a $DELIM rather than a 
> plain
> "/"?

I am not sure.  Could you have a look and see if you can get the CI
back to green?  The first thing I would test is to switch the error
patterns to be regexps based on the basenames rather than the full
paths (tweaking the queries on the system views to do htat), so as we
avoid all this business with slash and backslash transformations.
--
Michael


signature.asc
Description: PGP signature

Re: Fix order of checking ICU options in initdb and create database

2022-11-23 Thread Peter Eisentraut


On 19.11.22 20:36, Марина Полякова wrote:

Here is another set of proposed patches:

v2-0001-Fix-encoding-check-in-initdb-when-the-option-icu-.patch
Target: PG 15+
Fix encoding check in initdb when the option --icu-locale is not used:


I'm having a hard time figuring out from your examples what you are 
trying to change.  Which one is the "before" example and which one is 
the "after", and which aspect specifically is the issue and what 
specifically is being addressed?  I tried out the examples in the 
current code and didn't find anything obviously wrong in the behavior.


I'm concerned that the initdb.c changes are a bit unprincipled.  They 
just move code around to achieve some behavior without keeping the 
overall structure in mind.  For example, check_icu_locale_encoding() 
intentionally had the same API as check_locale_encoding(), but now 
that's being changed.  And setlocales() did all the locale parameter 
validity checking, but now part of that is being moved around.  I'm 
afraid this makes initdb.c even more spaghetti code than it already is.


What about those test changes?  I can't tell if they are related. 
createdb isn't being changed; is that test change related or separate?



v2-0002-doc-building-without-ICU-is-not-recommended.patch
Target: PG 15+
Fix the documentation that --without-icu is a marginal configuration.

v2-0003-Build-with-ICU-by-default.patch
Target: PG 16+
Build with ICU by default as already done for readline and zlib libraries.


We are not going to make these kinds of changes specifically for ICU. 
I'd say, for example, the same applies to --with-openssl and --with-lz4, 
and probably others.  If this is an issue, then we need a more general 
approach than just ICU.  This should be a separate thread in any case.

Re: fixing CREATEROLE

2022-11-23 Thread walther


Robert Haas:

I have to admit that when I realized that was the natural place to put
them to make the patch work, my first reaction internally was "well,
that can't possibly be right, role properties suck!". But I didn't and
still don't see where else to put them that makes any sense at all, so
I eventually decided that my initial reaction was misguided. So I
can't really blame you for not liking it either, and would be happy if
we could come up with something else that feels better. I just don't
know what it is: at least as of this moment in time, I believe these
naturally ARE properties of the role [...]

That might be the wrong view. As I say, I'm open to other ideas, and
it's possible there's some nicer way to do it that I just don't see
right now.


INHERITCREATEDROLES and SETCREATEDROLES behave much like DEFAULT 
PRIVILEGES. What about something like:


ALTER DEFAULT PRIVILEGES FOR alice
GRANT TO alice WITH INHERIT FALSE, SET TRUE, ADMIN TRUE

The "abbreviated grant" is very much abbreviated, because the original 
syntax GRANT a TO b is already quite short to begin with, i.e. there is 
no ON ROLE or something like that in it.


The initial DEFAULT privilege would be INHERIT FALSE, SET FALSE, ADMIN 
TRUE, I guess?


Best,

Wolfgang

Re: PGDOCS - Logical replication GUCs - added some xrefs

2022-11-23 Thread Peter Smith

On Wed, Nov 23, 2022 at 9:16 AM Peter Smith  wrote:
>
> On Wed, Nov 16, 2022 at 10:24 PM vignesh C  wrote:
> >
> ...
>
> > One suggestion:
> > The format of subscribers includes the data type and default values,
> > the format of publishers does not include data type and default
> > values. We can try to maintain the consistency for both publisher and
> > subscriber configurations.
> > +   
> > +wal_level must be set to logical.
> > +   
> >
> > + max_logical_replication_workers
> > (integer)
> > + 
> > +  max_logical_replication_workers
> > configuration parameter
> > + 
> > + 
> > + 
> > +  
> > +   Specifies maximum number of logical replication workers. This
> > must be set
> > +   to at least the number of subscriptions (for apply workers), plus 
> > some
> > +   reserve for the table synchronization workers.
> > +  
> > +  
> >
> > If we don't want to keep the same format, we could give a link to
> > runtime-config-replication where data type and default is defined for
> > publisher configurations max_replication_slots and max_wal_senders.
> >
>
> Thanks for your suggestions.
>
> I have included xref links to the original definitions, rather than
> defining the same GUC in multiple places.
>
> PSA v3.
>

I updated the patch. The content is unchanged from v3 but the links
are modified so now they render with the correct  format for
the GUC names.

PSA v4.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-Logical-replication-GUCs-consolidated.patch
Description: Binary data

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

2022-11-23 Thread Thomas Munro

On Thu, Nov 24, 2022 at 2:02 PM Thomas Munro  wrote:
> ... and you'll soon see:
>
> ERROR:  calculated CRC checksum does not match value stored in file

I forgot to mention: this reproducer only seems to work if fsync =
off.  I don't know why, but I recall that was true also for bug
#17064.

Re: Bug in row_number() optimization

2022-11-23 Thread Richard Guo

On Tue, Nov 22, 2022 at 3:44 PM Richard Guo  wrote:

> On Wed, Nov 16, 2022 at 7:38 AM Sergey Shinderuk <
> s.shinde...@postgrespro.ru> wrote:
>
>> The failing query is:
>> SELECT * FROM
>>(SELECT *,
>>count(salary) OVER (PARTITION BY depname || '') c1, -- w1
>>row_number() OVER (PARTITION BY depname) rn, -- w2
>>count(*) OVER (PARTITION BY depname) c2, -- w2
>>count(*) OVER (PARTITION BY '' || depname) c3 -- w3
>> FROM empsalary
>> ) e WHERE rn <= 1 AND c1 <= 3;
>> As far as I understand, ExecWindowAgg for the intermediate WindowAgg
>> node switches into pass-through mode, stops evaluating row_number(), and
>> returns the previous value instead. But if int8 is passed by reference,
>> the previous value stored in econtext->ecxt_aggvalues becomes a dangling
>> pointer when the per-output-tuple memory context is reset.
>
>
> Yeah, you're right.  In this example the window function row_number()
> goes into pass-through mode after the second evaluation because its
> run condition does not hold true any more.  The remaining run would just
> return the result from the second evaluation, which is stored in
> econtext->ecxt_aggvalues[wfuncno].
>
> If int8 is configured as pass-by-ref, the precomputed value from the
> second evaluation is actually located in a memory area from context
> ecxt_per_tuple_memory, with its pointer stored in ecxt_aggvalues.  As
> this memory context is reset once per tuple, we would be prone to wrong
> results.
>

Regarding how to fix this problem, firstly I believe we need to evaluate
window functions in the per-tuple memory context, as the HEAD does.
When we decide we need to go into pass-through mode, I'm thinking that
we can just copy out the results of the last evaluation to the per-query
memory context, while still storing their pointers in ecxt_aggvalues.

Does this idea work?

Thanks
Richard

RE: failures in t/031_recovery_conflict.pl on CI

2022-11-23 Thread Факеев Алексей

Hello.

Even after applying the patch, we are still facing an "ack Broken pipe"
problem. 
It occurs on the arm64 platform, presumably under high load. 
Here is a log snippet from buildfarm:
...
[19:08:12.150](0.394s) ok 13 - startup deadlock: cursor holding conflicting
pin, also waiting for lock, established
[19:08:12.208](0.057s) ok 14 - startup deadlock: lock acquisition is waiting
Waiting for replication conn standby's replay_lsn to pass 0/33F5FE0 on
primary
done
psql::8: ERROR:  canceling statement due to conflict with recovery
LINE 1: SELECT * FROM test_recovery_conflict_table2;
  ^
DETAIL:  User transaction caused buffer deadlock with recovery.
[19:08:12.319](0.112s) ok 15 - startup deadlock: logfile contains terminated
connection due to recovery conflict
ack Broken pipe: write( 13, '\\q
' ) at /usr/share/perl5/IPC/Run/IO.pm line 550.
### Stopping node "primary" using mode immediate
# Running: pg_ctl -D
/home/bf/build/buildfarm-flaviventris/REL_15_STABLE/pgsql.build/src/test/rec
overy/tmp_check/t_031_recovery_conflict_primary_data/pgdata -m immediate
stop
waiting for server to shut down... done
server stopped
# No postmaster PID for node "primary"
### Stopping node "standby" using mode immediate
# Running: pg_ctl -D
/home/bf/build/buildfarm-flaviventris/REL_15_STABLE/pgsql.build/src/test/rec
overy/tmp_check/t_031_recovery_conflict_standby_data/pgdata -m immediate
stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "standby"
[19:08:12.450](0.131s) # Tests were run but no plan was declared and
done_testing() was not seen.
[19:08:12.450](0.000s) # Looks like your test exited with 32 just after 15.
...

Below is a test report:
... 
[20:46:35] t/030_stats_cleanup_replica.pl ... ok 9956 ms ( 0.01 usr
0.00 sys +  3.49 cusr  2.49 csys =  5.99 CPU)
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 32 just after 15.
[20:46:43] t/031_recovery_conflict.pl ... 
Dubious, test returned 32 (wstat 8192, 0x2000)
All 15 subtests passed 
[20:46:56] t/032_relfilenode_reuse.pl ... ok13625 ms ( 0.01 usr
0.00 sys +  3.53 cusr  2.39 csys =  5.93 CPU)
[20:47:03] t/110_wal_sender_check_crc.pl  ok 6421 ms ( 0.00 usr
0.00 sys +  3.20 cusr  1.87 csys =  5.07 CPU)
[20:47:03]

Test Summary Report
---
t/031_recovery_conflict.pl (Wstat: 8192 Tests: 15 Failed: 0)
  Non-zero exit status: 32
  Parse errors: No plan found in TAP output

-Original Message-
From: Andres Freund  
Sent: Tuesday, May 3, 2022 11:13 PM
To: Tom Lane 
Cc: Robert Haas ; pgsql-hack...@postgresql.org;
Thomas Munro 
Subject: Re: failures in t/031_recovery_conflict.pl on CI

Hi,

On 2022-05-03 14:23:23 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >> So it's almost surely a timing issue, and your theory here seems
plausible.
> 
> > Unfortunately I don't think my theory holds, because I actually had 
> > added a defense against this into the test that I forgot about
momentarily...
> 
> Oh, hm.  I can try harder to repro it.

I've now reproduced it a couple times here running under rr, so it's
probably not worth putting too much effort into that.

Attached is a fix for the test that I think should avoid the problem.
Couldn't repro it with it applied, under both rr and valgrind.

My current problem is that I'm running into some IPC::Run issues (bug?). I
get "ack Broken pipe:" iff I add "SELECT pg_sleep(1);" after
"-- wait for lock held by prepared transaction"

It doesn't happen without that debugging thing, but it makes me worried that
it's something that'll come up in random situations.

It looks to me like it's a bug in IPC::Run - with a few changes I get the
failure to happen inside pump_nb(), which seems like it shouldn't error out
just because the child process exited...

I *think* it might not happen without the sleep. But I'm not at all
confident.

In general I'm kinda worried on how much effectively unmaintained perl stuff
we're depending :(

Greetings,

Andres Freund

Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-23 Thread Julien Rouhaud

On Thu, Nov 24, 2022 at 02:07:21PM +0900, Michael Paquier wrote:
> On Wed, Nov 23, 2022 at 03:56:50PM +0800, Julien Rouhaud wrote:
> > The depth 0 is getting used quite a lot now, maybe we should have a define 
> > for
> > it to make it easier to grep, like TOP_LEVEL_AUTH_FILE or something like 
> > that?
> > And also add a define for the magical 10 for the max inclusion depth, for 
> > both
> > auth files and GUC files while at it?
> 
> Sounds like a good idea to me, and it seems to me that this had better
> be unified between the GUCs (see ParseConfigFp() that hardcodes a
> depth of 0) and hba.c.  It looks like they could be added to
> conffiles.h, as of CONF_FILE_START_{LEVEL,DEPTH} and
> CONF_FILE_MAX_{LEVEL,DEPTH}.  Would you like to send a patch?

Agreed, and yes I will take care of that shortly.

> > The comment seems a bit ambiguous as with "loading the top..." you probably
> > meant something like "loading the file in memory" rather than "(re)loading 
> > the
> > configuration".  Maybe s/loading/opening/?
> 
> Right.  I have used "opening" at the end.
> 
> I have worked on all that, did more polishing to the docs, the
> comments and some tiny bits of the logic, and applied both 0001 and
> 0002.

Thanks a lot!

> Now, to the tests..
> 
> > Mmm, I haven't looked deeply so I'm not sure if the perl podules are aware 
> > of
> > it or not, but maybe we could somehow detect the used delimiter at the
> > beginning after normalizing the directory, and use a $DELIM rather than a 
> > plain
> > "/"?
> 
> I am not sure.  Could you have a look and see if you can get the CI
> back to green?  The first thing I would test is to switch the error
> patterns to be regexps based on the basenames rather than the full
> paths (tweaking the queries on the system views to do htat), so as we
> avoid all this business with slash and backslash transformations.

I'm also working on it!  Hopefully I should be able to come up with a fix soon.

Re: drop postmaster symlink

2022-11-23 Thread Daniel Gustafsson

> On 23 Nov 2022, at 21:10, Robert Haas  wrote:

> I don't actually care very much whether we get rid of the postmaster
> symlink or not, but if we aren't going to, we should stop calling it
> deprecated. If 15 years isn't enough time to remove it, what ever will
> be?

+1.  If we actively add support for something then it isn't really all that
deprecated IMHO.

--
Daniel Gustafsson   https://vmware.com/

Re: ssl tests aren't concurrency safe due to get_free_port()

2022-11-23 Thread Andrew Dunstan



On 2022-11-22 Tu 20:36, Tom Lane wrote:
> Andres Freund  writes:
>> While looking into a weird buildfarm failure ([1]), I noticed this:
>> # Checking port 62707
>> Use of uninitialized value $pid in scalar chomp at 
>> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>>  line 1247.
>> Use of uninitialized value $pid in addition (+) at 
>> /mnt/resource/bf/build/grassquit/REL_11_STABLE/pgsql.build/../pgsql/src/test/perl/PostgresNode.pm
>>  line 1248.
> Yeah, my animals are showing that too.
>
>> Not quite sure how $pid ends up uninitialized, given the code:
>>  # see if someone else has or had a reservation of this port
>>  my $pid = <$portfile>;
>>  chomp $pid;
>>  if ($pid +0 > 0)
> I guess the <$portfile> might return undef if the file is empty?
>
>   


Yeah, should be fixed now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: More efficient build farm animal wakeup?

2022-11-23 Thread Andrew Dunstan



On 2022-11-23 We 16:59, Tom Lane wrote:
> Thomas Munro  writes:
>> On Thu, Nov 24, 2022 at 10:00 AM Magnus Hagander  wrote:
>>> On Wed, Nov 23, 2022 at 9:15 AM Thomas Munro  wrote:
>>> Are you saying you still think it's worth pursuing longpoll or similar 
>>> methods for it, or that this is good enough?
>> I personally think it'd be pretty neat, to squeeze out that last bit
>> of latency.  Maybe it's overkill...
> I can't get excited about pursuing the last ~30 seconds of delay
> for launching tasks that are going to run 10 or 20 or more minutes
> (where the future trend of those numbers is surely up not down).
>
> The thing that was really significantly relevant here IMO was to
> reduce the load on the central server, and I think we've done that.
> Would adding longpoll reduce that further?  In principle maybe,
> but I'm not sure we have enough animals to make it worthwhile.
>
>   


Yeah, that's my feeling. We have managed to get a large improvement with
a fairly small effort, I'm much less excited about getting another small
improvement from a large effort.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley

On Thu, 24 Nov 2022 at 02:27, Simon Riggs  wrote:
>
> On Wed, 23 Nov 2022 at 13:04, David Rowley  wrote:

> > I'd rather see this solved like v4 is doing it.
>
> Please do. No further comments. Thanks for your help

Thanks. I pushed the v4 patch with some minor comment adjustments and
also renamed _hash_pgaddtup()'s new parameter to "appendtup". I felt
that better reflected what the parameter does in that function.

David

Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-11-23 Thread Andrey Chudnovsky

> How does this differ from the previous proposal? The OAUTHBEARER SASL
> mechanism already relies on OIDC for discovery. (I think that decision
> is confusing from an architectural and naming standpoint, but I don't
> think they really had an alternative...)
Mostly terminology questions here. OAUTHBEARER SASL appears to be the
spec about using OAUTH2 tokens for Authentication.
While any OAUTH2 can generally work, we propose to specifically
highlight that only OIDC providers can be supported, as we need the
discovery document.
And we won't be able to support Github under that requirement.
Since the original patch used that too - no change on that, just
confirmation that we need OIDC compliance.

> 0) The original hook proposal upthread, I thought, was about allowing
> libpq's flow implementation to be switched out by the application. I
> don't see that approach taken here. It's fine if that turned out to be a
> bad idea, of course, but this patch doesn't seem to match what we were
> talking about.
We still plan to allow the client to pass the token. Which is a
generic way to implement its own OAUTH flows.

> 1) I'm really concerned about the sudden explosion of flows. We went
> from one flow (Device Authorization) to six. It's going to be hard
> enough to validate that *one* flow is useful and can be securely
> deployed by end users; I don't think we're going to be able to maintain
> six, especially in combination with my statement that iddawc is not an
> appropriate dependency for us.

> I'd much rather give applications the ability to use their own OAuth
> code, and then maintain within libpq only the flows that are broadly
> useful. This ties back to (0) above.
We consider the following set of flows to be minimum required:
- Client Credentials - For Service to Service scenarios.
- Authorization Code with PKCE - For rich clients,including pgAdmin.
- Device code - for psql (and possibly other non-GUI clients).
- Refresh code (separate discussion)
Which is pretty much the list described here:
https://oauth.net/2/grant-types/ and in OAUTH2 specs.
Client Credentials is very simple, so does Refresh Code.
If you prefer to pick one of the richer flows, Authorization code for
GUI scenarios is probably much more widely used.
Plus it's easier to implement too, as interaction goes through a
series of callbacks. No polling required.

> 2) Breaking the refresh token into its own pseudoflow is, I think,
> passing the buck onto the user for something that's incredibly security
> sensitive. The refresh token is powerful; I don't really want it to be
> printed anywhere, let alone copy-pasted by the user. Imagine the
> phishing opportunities.

> If we want to support refresh tokens, I believe we should be developing
> a plan to cache and secure them within the client. They should be used
> as an accelerator for other flows, not as their own flow.
It's considered a separate "grant_type" in the specs / APIs.
https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens

For the clients, it would be storing the token and using it to authenticate.
On the question of sensitivity, secure credentials stores are
different for each platform, with a lot of cloud offerings for this.
pgAdmin, for example, has its own way to secure credentials to avoid
asking users for passwords every time the app is opened.
I believe we should delegate the refresh token management to the clients.

>3) I don't like the departure from the OAUTHBEARER mechanism that's
> presented here. For one, since I can't see a sample plugin that makes
> use of the "flow type" magic numbers that have been added, I don't
> really understand why the extension to the mechanism is necessary.
I don't think it's much of a departure, but rather a separation of
responsibilities between libpq and upstream clients.
As libpq can be used in different apps, the client would need
different types of flows/grants.
I.e. those need to be provided to libpq at connection initialization
or some other point.
We will change to "grant_type" though and use string to be closer to the spec.
What do you think is the best way for the client to signal which OAUTH
flow should be used?

On Wed, Nov 23, 2022 at 12:05 PM Jacob Champion  wrote:
>
> On 11/23/22 01:58, mahendrakar s wrote:
> > We validated on  libpq handling OAuth natively with different flows
> > with different OIDC certified providers.
> >
> > Flows: Device Code, Client Credentials and Refresh Token.
> > Providers: Microsoft, Google and Okta.
>
> Great, thank you!
>
> > Also validated with OAuth provider Github.
>
> (How did you get discovery working? I tried this and had to give up
> eventually.)
>
> > We propose using OpenID Connect (OIDC) as the protocol, instead of
> > OAuth, as it is:
> > - Discovery mechanism to bridge the differences and provide metadata.
> > - Stricter protocol and certification process to reliably identify
> > which providers can be supported.
> > - OIDC is designed for authentication, while the main

Re: Collation version tracking for macOS

2022-11-23 Thread Thomas Munro

On Thu, Nov 24, 2022 at 3:07 PM Jeff Davis  wrote:
> I'm sure this has been discussed, but which distros even support
> multiple major versions of ICU?

For Debian and friends, you can install any number of libicuNN
packages (if you can find them eg from previous release repos), but
there's only one libicu-dev.  That means that one specific major
version is blessed by each Debian release and has its headers and
static libraries for you to use as a developer, but you can still
install the dynamic libraries from older releases at the same time to
satisfy the dependencies of packages or programs that were built on an
earlier OS release.  They don't declare conflicts on each other and
they contain non-conflicting filenames.  That's similar to the way
standard libraries and various other things are treated, for backward
compatibility.

For RHEL and friends, I'm pretty sure it's the same concept, but I
don't use those and haven't seen it with my own eyes.

I don't know for other Linux distros/families, but I expect the above
two cover a huge percentage of our users and I expect others to have
made similar choices.

For the BSDs, which tend to have a single binary package with both
headers and libraries owing to their origins as source-based
distributions (ports), the above way of thinking doesn't work; I
couldn't develop this on my usual FreeBSD battlestation without
building ICU myself (problem being that there's only one "pkg install
icu") and I hope to talk to someone who knows what to do about that
eventually.  I want this to work there easily for end users.

macOS and Windows have so many different ways of installing things
that there isn't a single answer there; supposedly open source is like
a bazaar and closed source like a cathedral, but as far as package
management goes, it looks more like rubble to me.

Re: Collation version tracking for macOS

2022-11-23 Thread Thomas Munro

On Thu, Nov 24, 2022 at 3:07 PM Jeff Davis  wrote:
> I'd vote for 1 on the grounds that it's easier to document and
> understand a single collation version, which comes straight from
> ucol_getVersion(). This approach makes it a separate problem to find
> the collation version among whatever libraries the admin can provide;
> but adding some observability into the search should mitigate any
> confusion.

OK, it sounds like I should code that up next.

> Can you go over the advantages of approaches 2-4 again? Is it just a
> concern about burdening the admin with finding the right ICU library
> version for a given collation version? That's a valid concern, but I
> don't think that should be an overriding design point. It seems more
> important to model the collation versions properly.

Yes, that's a good summary.  The user has a problem, and the solution
is to find some version of ICU and install it, so the problem space
necessarily involves the other kind of version.  My idea was that we
should therefore make that part of the model.  But the observability
support does indeed make it a bit clearer what's going on.

Allow processes to reset procArrayGroupNext themselves instead of leader resetting for all the followers

2022-11-23 Thread Bharath Rupireddy

Hi,

While working on something else, I noticed that the proc array group
XID clearing leader resets procArrayGroupNext of all the followers
atomically along with procArrayGroupMember. ISTM that it's enough for
the followers to exit the wait loop and continue if the leader resets
just procArrayGroupMember, the followers can reset procArrayGroupNext
by themselves. This relieves the leader a bit, especially when there
are many followers, as it avoids a bunch of atomic writes and
pg_write_barrier() for the leader .

I'm attaching a small patch with the above change. It doesn't seem to
break anything, the cirrus-ci members are happy
https://github.com/BRupireddy/postgres/tree/allow_processes_to_reset_proc_array_v1.
It doesn't seem to show visible benefit on my system, nor hurt anyone
in my testing [1].

Thoughts?

[1]
# of clientsHEADPATCHED
13166131512
26713466789
4135084132372
8254174255384
16418598420903
32491922494183
64509824509451
128513298512838
256505191496266
512465208453588
768431304425736
1024398110397352
2048308732308901
4096200355219480

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 097e74e5d1477e6670cf1bcb9316a4a850c960ba Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 23 Nov 2022 14:08:19 +
Subject: [PATCH v1] Allow processes to reset procArrayGroupNext themselves
 instead of leader

---
 src/backend/storage/ipc/procarray.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 06918759e7..459ac8ead9 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -826,7 +826,9 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 		}
 		pgstat_report_wait_end();
 
-		Assert(pg_atomic_read_u32(>procArrayGroupNext) == INVALID_PGPROCNO);
+		Assert(pg_atomic_read_u32(>procArrayGroupNext) != INVALID_PGPROCNO);
+
+		pg_atomic_write_u32(>procArrayGroupNext, INVALID_PGPROCNO);
 
 		/* Fix semaphore count for any absorbed wakeups */
 		while (extraWaits-- > 0)
@@ -874,10 +876,6 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 		PGPROC	   *nextproc = [wakeidx];
 
 		wakeidx = pg_atomic_read_u32(>procArrayGroupNext);
-		pg_atomic_write_u32(>procArrayGroupNext, INVALID_PGPROCNO);
-
-		/* ensure all previous writes are visible before follower continues. */
-		pg_write_barrier();
 
 		nextproc->procArrayGroupMember = false;
 
-- 
2.34.1

1 2 >

1 - 100 of 127 matches

Mail list logo