Re: \describe*

2019-06-21 Thread David Fetter
On Fri, Jun 21, 2019 at 05:49:43PM -0400, Alvaro Herrera wrote:
> On 2019-Jun-21, David Fetter wrote:
> 
> > On Fri, Jun 21, 2019 at 05:20:54PM -0400, Alvaro Herrera wrote:
> > > On 2018-Jan-29, David Fetter wrote:
> > > 
> > > > We could certainly have \d call DESCRIBE for later versions of the
> > > > server.  \ commands which call different SQL depending on server
> > > > version have long been a standard practice.
> > > 
> > > So what is the uptake on implementing this at the server side, ie.
> > > DESCRIBE?
> > 
> > I've got a few Round Tuits available this weekend.  This seems like a
> > worthwhile thing to spend them on.
> 
> That's great, but my question is whether you managed to convince anyone
> whether it's a good idea.

Everybody who's used MySQL will.  In some sense, I'm more concerned
about the users in the future, who I hope vastly outnumber the users
in the present and past.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Usage of epoch in txid_current

2019-06-21 Thread Alexander Korotkov
On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro  wrote:
> Thanks for the reviews!  Pushed.

Any ideas we should move towards 64-bit xids in more places?  That has
been discussed several times already.  I think last time it was
discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
There we've discussed that it probably doesn't worth it to change
32-bit on-disk xids in heap.  It's better to leave existing heap "as
is", but allow other pluggable table access methods to support 64-bit
xids.  Given now we have pluggable table access methods, we may build
a plan on full support of 64-bit xids in core.

In my vision sketchy plan may look like this.

1. Change all non-heap types of xids from TransactionId to
FullTransactionId.  But in shared memory change TransactionId to
pg_atomic_uint64 in order to guarantee atomicity of access, which we
require in multiple places.
2. Also introduce FullMultixactId, and apply to MultixactId the
similar change as #1.
3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
procedure to pg_upgrade.
4. Change relfrozenxid and relminmxid to something like rellimitxid
and rellimitmxid.  So, we don't imply there is restriction of 32-bit
xids.  Instead, we let table AM place (or don't place) a limit to
advancing nextXid, nextMultixact.
5. Table AM API would be switched to 64-bit xids/multixacts, while
heap will remain using 32-bit.  So, heap should convert 32-bit xid
value to 64-bit basing on nextXid/nextMultixact.  Assuming we set
rellimitxid and rellimitmxid for relation as oldestxid + 2^32 and
oldestmxid + 2^32, we may safely assume the 32-bit values of
xids/multixacts are in 2^32 range before nextXid/nextMultixact.
Thanks to this even in heap we would be able to operate 2^32
xids/multixacts simultaneously instead of 2^31 we have now.

So, this is how I see the possible plan.  I would be glad to see a feedback.

Unfortunately, I don't have enough time to implement all of this.  But
I think there are many interested parties in finally having 64-bit
xids in core.  Especially assuming we now have pluggable table AMs,
and it would be ridiculous to spear limitation of 32-bit xids to new
table AMs.

Links

1. https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_Developer_Meeting

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: POC: Cleaning up orphaned files using undo logs

2019-06-21 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 20, 2019 at 4:54 AM Dilip Kumar  wrote:
>> The idea behind having the loop inside the undo machinery was that
>> while traversing the blkprev chain, we can read all the undo records
>> on the same undo page under one buffer lock.

> That's not a bad goal, although invoking a user-supplied callback
> while holding a buffer lock is a little scary.

I nominate Robert for Understater of the Year.  I think there's pretty
much 0 chance of that working reliably.

regards, tom lane




Re: \describe*

2019-06-21 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Jan-29, David Fetter wrote:
>> We could certainly have \d call DESCRIBE for later versions of the
>> server.  \ commands which call different SQL depending on server
>> version have long been a standard practice.

> So what is the uptake on implementing this at the server side, ie.
> DESCRIBE?

I'm pretty skeptical of this idea, unless you are willing to throw
away at least one and possibly both of the following goals:

1. Compatibility with psql's existing \d behavior.

2. Usability of DESCRIBE for any purpose whatsoever other than emitting
something that looks just like what psql prints.

We've migrated many of the \d displays so far away from "a single query
result" that I don't believe there's a way for a server command to
duplicate them, at least not without some seriously unholy in-bed-ness
between the server command and some postprocessing logic in describe.c.
(At which point you've lost whatever system architectural value there
might be in the whole project, since having a more-arm's-length
relationship there kinda seems like the point to me.)

There are a bunch of other little behavioral differences that you just
can't replicate server-side, like the fact that localization of the
results depends on psql's LC_MESSAGES not the server's.  Maybe people
would be okay with changing that, but it's not a transparent
reimplementation.

I think if we want to have server-side describe capability, we're better
off just to implement a DESCRIBE command that's not intended to be exactly
like \d anything, and not try to make it be the implementation for \d
anything.  (This was, in fact, where David started IIUC.  Other people's
sniping at that idea hasn't yielded any better idea.)

In particular, I'm really strongly against having "\describe-foo-bar"
invoke DESCRIBE, because (a) that will break compatibility with the
existing \des command, and (b) it's not actually saving any typing,
and (c) I think it'd confuse users no end.

Of course, this line of thought does lead to the conclusion that we'd be
maintaining psql/describe.c and server-side DESCRIBE in parallel forever,
which doesn't sound like fun.  But we should be making DESCRIBE with an
eye to more use-cases than psql.  If it allows jdbc to not also maintain
a pile of equivalent code, that'd be a win.  If it allows pg_dump to toss
a bunch of logic overboard (or at least stop incrementally adding new
variants), that'd be a big win.

regards, tom lane




Re: \describe*

2019-06-21 Thread Alvaro Herrera
On 2019-Jun-21, David Fetter wrote:

> On Fri, Jun 21, 2019 at 05:20:54PM -0400, Alvaro Herrera wrote:
> > On 2018-Jan-29, David Fetter wrote:
> > 
> > > We could certainly have \d call DESCRIBE for later versions of the
> > > server.  \ commands which call different SQL depending on server
> > > version have long been a standard practice.
> > 
> > So what is the uptake on implementing this at the server side, ie.
> > DESCRIBE?
> 
> I've got a few Round Tuits available this weekend.  This seems like a
> worthwhile thing to spend them on.

That's great, but my question is whether you managed to convince anyone
whether it's a good idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: \describe*

2019-06-21 Thread David Fetter
On Fri, Jun 21, 2019 at 05:20:54PM -0400, Alvaro Herrera wrote:
> On 2018-Jan-29, David Fetter wrote:
> 
> > We could certainly have \d call DESCRIBE for later versions of the
> > server.  \ commands which call different SQL depending on server
> > version have long been a standard practice.
> 
> So what is the uptake on implementing this at the server side, ie.
> DESCRIBE?

I've got a few Round Tuits available this weekend.  This seems like a
worthwhile thing to spend them on.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: POC: Cleaning up orphaned files using undo logs

2019-06-21 Thread Robert Haas
On Thu, Jun 20, 2019 at 4:54 AM Dilip Kumar  wrote:
> The idea behind having the loop inside the undo machinery was that
> while traversing the blkprev chain, we can read all the undo records
> on the same undo page under one buffer lock.

That's not a bad goal, although invoking a user-supplied callback
while holding a buffer lock is a little scary.  If we stick with that,
it had better be clearly documented.  Perhaps worth noting:
ReadBuffer() is noticeably more expensive in terms of CPU consumption
than LockBuffer().  So it may work out that we keep a pin to avoid
redoing that and worry less about retaking the buffer locks.  But I'm
not sure: avoiding buffer locks is clearly good, too.

I have a couple of observations after poking at this some more.  One
is that it's not necessarily adequate to have an interface that
iterates backward through the undo chain until the callback returns
true.  Here's an example: suppose you want to walk backward through
the undo chain until you find the first undo record that corresponds
to a change you can see, and then return the undo record immediately
prior to that.  zheap doesn't really need this, because it (mostly?)
stores the XID of the next record we're going to look up in the
current record, and the XID of the first record we're going to look up
in the chain -- so it can tell from the record it's found so far
whether it should bother looking up the next record. That, however,
would not necessarily be true for some other AM.

Suppose you just store an undo pointer in the tuple header, as Heikki
proposed to do for zedstore.  Suppose further that each record has the
undo pointer for the previous record that modified that TID but not
necessarily the XID.  Then imagine a TID where we do an insert and a
bunch of in-place updates.  Then, a scan comes along with an old
snapshot.  It seems to me that what you need to do is walk backward
through the chain of undo records until you see one that has an XID
that is visible to your snapshot, and then the version of the tuple
that you want is in the payload of the next-newer undo record. So what
you want to do is something like this:

look up the undo pointer in the tuple.  call that the current undo record.
loop:
- look up the undo pointer in the current undo record.  call that the
previous undo record.
- if the XID from the previous undo record is visible, then stop; use
the tuple version from the current undo record.
- release the current undo record and let the new current undo record
be the previous undo record.

I'm not sure if this is actually a reasonable design from a
performance point of view, but it doesn't sound totally crazy, and
it's just to illustrate the point that there might be cases that are
too complicated for a loop-until-true model.  In this kind of loop, at
any given time you are holding onto two undo records, working your way
back through the undo log, and you just can't make this work with the
UndoFetchRecord callback interface. Possibly you could have a context
object that could hold onto one or a few buffer pins:

BeginUndoFetch(&cxt);
uur = UndoFetchRecord(&cxt, urp); // maybe do this a bunch of times
FinishUndoFetch(&cxt);

...but I'm not sure if that's exactly what we want either. Still, if
there is a significant savings from avoiding repinning and relocking
the buffer, we want to make it easy for people to get that advantage
as often as possible.

Another point that has occurred to me is that it is probably
impossible to avoid a fairly large number of duplicate undo fetches.
For instance, suppose somebody runs an UPDATE on a tuple that has been
recently updated.  The tuple_update method just gets a TID + snapshot,
so the AM basically has to go look up the tuple all over again,
including checking whether the latest version of the tuple is the one
visible to our snapshot.  So that means repinning and relocking the
same buffers and decoding the same undo record all over again.  I'm
not exactly sure what to do about this, but it seems like a potential
performance problem. I wonder if it's feasible to cache undo lookups
so that in common cases we can just reuse the result of a previous
lookup instead of doing a new one, and I wonder whether it's possible
to make that fast enough that it actually helps...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: \describe*

2019-06-21 Thread Alvaro Herrera
On 2018-Jan-29, David Fetter wrote:

> We could certainly have \d call DESCRIBE for later versions of the
> server.  \ commands which call different SQL depending on server
> version have long been a standard practice.

So what is the uptake on implementing this at the server side, ie.
DESCRIBE?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: allow_system_table_mods stuff

2019-06-21 Thread Andres Freund
Hi,

On 2019-06-21 16:37:16 -0400, Tom Lane wrote:
> We do have to get past the compatibility issue though.  My thought was
> that for a period of N years we could force allow_system_table_dml = on
> while running extension scripts, and then cease doing so.  This would
> give extension authors a reasonable window in which their scripts would
> work in either old or new backends.  At some point in that time they'd
> probably have occasion to make other changes that render their scripts
> not backwards compatible, at which point they can insert "SET
> allow_system_table_dml = on" so that the script keeps working when we
> remove the compatibility hack.

I'd modify this approach by having a allow_system_table_dml level that
warns when DDL to system tables is performed, and then set
allow_system_table_dml to that when processing extension scripts (and
perhaps modify the warning message when creating_extension ==
true). That way it'd be easier to spot such extension scripts.

And I'd personally probably just set allow_system_table_dml to warn when
working interactively, just to improve logging etc.


> This line of thought leads to the conclusion that we do want
> separate "allow_system_table_dml" and "allow_system_table_ddl"
> bools.  Otherwise, the backwards-compatibility hack would need
> to turn on a level of unsafety that extension scripts have *not*
> had before and surely shouldn't have by default.

+1

Greetings,

Andres Freund




Re: allow_system_table_mods stuff

2019-06-21 Thread Chapman Flack
On 6/21/19 4:37 PM, Tom Lane wrote:
> We do have to get past the compatibility issue though.  My thought was
> that for a period of N years we could force allow_system_table_dml = on
> while running extension scripts, and then cease doing so.  This would
> give extension authors a reasonable window in which their scripts would
> work in either old or new backends.  At some point in that time they'd
> probably have occasion to make other changes that render their scripts
> not backwards compatible, at which point they can insert "SET
> allow_system_table_dml = on" so that the script keeps working when we
> remove the compatibility hack.

I was having second thoughts too, like maybe to tweak ALTER EXTENSION
UPDATE to unconditionally force the flag on before the update script and
reset it after, but warn if it is actually still set at the reset-after
point.

Extension maintainers could then make the warning go away by releasing
versions where the update scripts contain an explicit RESET (at the very
top, if they do nothing fancy), or a(n initially redundant) SET at the
top and RESET at the bottom. No new control file syntax.

Regards,
-Chap




Re: allow_system_table_mods stuff

2019-06-21 Thread Tom Lane
Chapman Flack  writes:
> I'd be leery of collateral damage from that to extension update scripts
> in extension releases currently in the wild.

Yeah, that's my primary concern here.

> Maybe there should be a new extension control file setting
> needs_system_table_mods = (boolean)
> which means what it says if it's there, but if an ALTER EXTENSION
> UPDATE sees a control file that lacks the setting, assume true
> (with a warning?).

I think that having a SET command in the update script is the way to go;
for one thing it simplifies testing the script by just sourcing it,
and for another it defaults to no-special-privileges which is the
right default.  Also, non-backward-compatible control files aren't
any nicer than non-backward-compatible script files.

We do have to get past the compatibility issue though.  My thought was
that for a period of N years we could force allow_system_table_dml = on
while running extension scripts, and then cease doing so.  This would
give extension authors a reasonable window in which their scripts would
work in either old or new backends.  At some point in that time they'd
probably have occasion to make other changes that render their scripts
not backwards compatible, at which point they can insert "SET
allow_system_table_dml = on" so that the script keeps working when we
remove the compatibility hack.

(Of course, we have an awful track record about actually doing things
N years after we said we would, but doesn't anybody around here have
a calendar app?)

This line of thought leads to the conclusion that we do want
separate "allow_system_table_dml" and "allow_system_table_ddl"
bools.  Otherwise, the backwards-compatibility hack would need
to turn on a level of unsafety that extension scripts have *not*
had before and surely shouldn't have by default.

regards, tom lane




Re: allow_system_table_mods stuff

2019-06-21 Thread Chapman Flack
On 6/21/19 3:07 PM, Stephen Frost wrote:
> When it comes to cases that fundamentally are one-off's and that we
> don't think really deserve a proper DDL command, then I'd say we make
> the extensions set the flag.  At least then it's clear "hey, we had to
> do something really grotty here, maybe don't copy this into your new
> extension, or don't use this method."  We should also un-set the flag
> after.

I'd be leery of collateral damage from that to extension update scripts
in extension releases currently in the wild.

Maybe there should be a new extension control file setting

needs_system_table_mods = (boolean)

which means what it says if it's there, but if an ALTER EXTENSION
UPDATE sees a control file that lacks the setting, assume true
(with a warning?).

Regards,
-Chap




Re: File descriptors inherited by restore_command

2019-06-21 Thread David Steele
On 6/21/19 10:26 AM, Stephen Frost wrote:
>>
>>> Another possible issue is that if we allow a child process to inherit
>>> all these fds it might accidentally write to them, which would be bad.
>>> I know the child process can go and maliciously open and trash files if
>>> it wants, but it doesn't seem like we should allow it to happen
>>> unintentionally.
>>
>> True.  But I don't want to think of this as a security issue, because
>> then it becomes a security bug to forget O_CLOEXEC anywhere in the
>> backend, and that is a standard we cannot meet.  (Even if we could
>> hold to it for the core code, stuff like libperl and libpython can't
>> be relied on to play ball.)  In practice, as long as we use O_CLOEXEC
>> for files opened by fd.c, that would eliminate the actual too-many-fds
>> hazard.  I don't object to desultorily looking around for other places
>> where we might want to add it, but personally I'd be satisfied with a
>> patch that CLOEXEC-ifies fd.c.
> 
> Agreed, it's not a security issue, and also agreed that we should
> probably get it done with fd.c right off, and then if someone wants to
> think about other places where it might be good to do then more power to
> them and it seems like we'd be happy to accept such patches.

I agree this is not a security issue and I wasn't intending to present
it that way, but in general the more fds closed the better.

I'll work up a patch for fd.c which is the obvious win and we can work
from there if it makes sense.  I'll be sure to test EXEC_BACKEND on
Linux but I don't think it will matter on Windows.  cfbot may feel
differently, though.

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: Problem with default partition pruning

2019-06-21 Thread Alvaro Herrera
On 2019-Jun-17, Shawn Wang wrote:

> I tested different types of key values, and multi-level partitioned tables, 
> and found no problems.
> Only the SQL in the file of src/test/regress/results/partition_prune.out has 
> a space that caused the regression test to fail.

It's not clear to me what patch were you reviewing.  The latest patch I
see in this thread, in [1], does not apply in any branches.  As another
test, I tried to apply the patch on commit 489e431ba56b (before Tom's
partprune changes in mid May); if you use "patch -p1
--ignore-whitespace" it is accepted, but the failure case proposed at
the start of the thread shows the same behavior (namely, that test1_def
is scanned when it is not needed):

55432 12devel 23506=# create table test1(id int, val text) partition by range 
(id);
create table test1_1 partition of test1 for values from (0) to (100);
create table test1_2 partition of test1 for values from (150) to (200);
create table test1_def partition of test1 default;
explain select * from test1 where id > 0 and id < 30;
CREATE TABLE
Duración: 5,736 ms
CREATE TABLE
Duración: 5,622 ms
CREATE TABLE
Duración: 3,585 ms
CREATE TABLE
Duración: 3,828 ms
   QUERY PLAN
─
 Append  (cost=0.00..58.16 rows=12 width=36)
   ->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36)
 Filter: ((id > 0) AND (id < 30))
   ->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36)
 Filter: ((id > 0) AND (id < 30))
(5 filas)

Duración: 2,465 ms


[1] https://postgr.es/m/00cf01d4eea7$afa43370$0eec9a50$@lab.ntt.co.jp

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: allow_system_table_mods stuff

2019-06-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> So there's certainly some fraction of these cases where we could have
> avoided doing manual catalog updates by expending work on some ALTER
> command instead.  But I don't see much reason to think that we could,
> or should try to, insist that every such case be done that way.  The
> cost/benefit ratio is not there in some cases, and in others, exposing
> a DDL command to do it would just be providing easier access to
> something that's fundamentally unsafe anyway.

In the cases where we can do better by providing a DDL command, it's
certainly my opinion that we should go that route.  I don't think we
should allow something that's fundamentally unsafe in that way- for
those cases though, how is the extension script making it 'safe'?  If it
simply is hoping, well, that smells like a bug, and we probably should
try to avoid having that in our extensions as folks do like to copy
them.

When it comes to cases that fundamentally are one-off's and that we
don't think really deserve a proper DDL command, then I'd say we make
the extensions set the flag.  At least then it's clear "hey, we had to
do something really grotty here, maybe don't copy this into your new
extension, or don't use this method."  We should also un-set the flag
after.

> The change-proargtypes example actually brings up a larger point:
> exactly how is, say, screwing with the contents of the pg_class
> row for a system catalog any safer than doing "DDL" on the catalog?
> I don't think we should fool ourselves that the one thing is
> inherently safer than the other.

I don't believe one to be safer than the other...

> In none of these cases are we ever going to be able to say "that's
> generically safe", or at least if we try, we're going to find that
> distinguishing safe cases from unsafe requires unreasonable amounts
> of effort.  I don't think it's a productive thing to spend time on.
> I don't mind having two separate "allow_system_table_ddl" and
> "allow_system_table_dml" flags, because it's easy to tell what each
> of those is supposed to enforce.

Which implies that it doesn't make sense to have two different flags
for it.

> But I'm going to run away screaming
> from any proposal to invent "allow_safe_system_table_dml".  It's a
> recipe for infinite security bugs and it's just not worth it.

Yeah, I'm not really a fan of that either.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: allow_system_table_mods stuff

2019-06-21 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Keep in mind that DML-on-system-catalogs is unfortunately a really
>> standard hack in extension upgrade scripts.  (If memory serves,
>> some of our contrib scripts do that, and we've certainly told third
>> parties that it's the only way out of some box or other.)

> As with other cases where someone needs to do DML against the catalog
> for some reason or another- we should fix that.  If there's example
> cases, great!  Let's look at those and come up with a proper solution.

As I said, we've got examples.  I traced the existing UPDATEs-on-catalogs
in contrib scripts back to their origin commits, and found these:


commit a89b4b1be0d3550a7860250ff74dc69730555a1f
Update citext extension for parallel query.

This could have been done cleaner if we had ALTER AGGREGATE variants
that would let us add an aggcombine function after the fact and mark
the aggregate PARALLEL SAFE.  Seems like a reasonable feature, but
it still doesn't exist today, three years later.

commit 94be9e3f0ca9e7ced66168397eb586565bced9ca
Fix citext's upgrade-from-unpackaged script to set its collation correctly.
commit 9b97b7f8356c63ea0b6704718d75ea01ec3035bf
Fix citext upgrade script to update derived copies of pg_type.typcollation.

The difficulty here was lack of ALTER TYPE to change a type's
collation.  We'd have to rethink the denormalized storage of
typcollation in a lot of other places if we wanted to support that,
but possibly it'd be worth it.

commit 749a787c5b25ae33b3d4da0ef12aa05214aa73c7
Handle contrib's GIN/GIST support function signature changes honestly.

This needed to change the declared argument types of some support
functions, without having their OIDs change.  No, I *don't* think
it'd be a good idea to provide a DDL command to do that.

commit de1d042f5979bc1388e9a6d52a4d445342b04932
Support index-only scans in contrib/cube and contrib/seg GiST indexes.

"The only exciting part of this is that ALTER OPERATOR FAMILY lacks
a way to drop a support function that was declared as being part of
an opclass rather than being loose in the family.  For the moment,
we'll hack our way to a solution with a manual update of the pg_depend
entry type, which is what distinguishes the two cases.  Perhaps
someday it'll be worth providing a cleaner way to do that, but for
now it seems like a very niche problem."

commit 0024e348989254d48dc4afe9beab98a6994a791e
Fix upgrade of contrib/intarray and contrib/unaccent from 9.0.
commit 4eb49db7ae634fab9af7437b2e7b6388dfd83bd3
Fix contrib/pg_trgm to have smoother updates from 9.0.

More cases where we had to change the proargtypes of a pg_proc
entry without letting its OID change.

commit 472f608e436a41865b795c999bda3369725fa097
One more hack to make contrib upgrades from 9.0 match fresh 9.1 installs.

Lack of a way to replace a support function in an existing opclass.
It's possible this could be done better, but on the other hand, it'd
be *really* hard to have an ALTER OPCLASS feature for that that would
be even a little bit concurrency-safe.


So there's certainly some fraction of these cases where we could have
avoided doing manual catalog updates by expending work on some ALTER
command instead.  But I don't see much reason to think that we could,
or should try to, insist that every such case be done that way.  The
cost/benefit ratio is not there in some cases, and in others, exposing
a DDL command to do it would just be providing easier access to
something that's fundamentally unsafe anyway.

The change-proargtypes example actually brings up a larger point:
exactly how is, say, screwing with the contents of the pg_class
row for a system catalog any safer than doing "DDL" on the catalog?
I don't think we should fool ourselves that the one thing is
inherently safer than the other.

In none of these cases are we ever going to be able to say "that's
generically safe", or at least if we try, we're going to find that
distinguishing safe cases from unsafe requires unreasonable amounts
of effort.  I don't think it's a productive thing to spend time on.
I don't mind having two separate "allow_system_table_ddl" and
"allow_system_table_dml" flags, because it's easy to tell what each
of those is supposed to enforce.  But I'm going to run away screaming
from any proposal to invent "allow_safe_system_table_dml".  It's a
recipe for infinite security bugs and it's just not worth it.

regards, tom lane




progress report for ANALYZE

2019-06-21 Thread Alvaro Herrera
Hello

Here's a patch that implements progress reporting for ANALYZE.


Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 572fea555bc1c9d3512f7806aed6b97a374199eb Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 21 Jun 2019 13:35:13 -0400
Subject: [PATCH v1] Report progress for ANALYZE

---
 doc/src/sgml/monitoring.sgml | 129 +++
 src/backend/catalog/system_views.sql |  15 
 src/backend/commands/analyze.c   |  55 +++-
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/backend/utils/misc/sampling.c|   6 +-
 src/include/commands/progress.h  |  14 +++
 src/include/pgstat.h |   1 +
 src/include/utils/sampling.h |   2 +-
 src/test/regress/expected/rules.out  |  17 
 9 files changed, 237 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..a924ab3b06 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
  
   pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
@@ -3927,6 +3935,127 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for ech backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether the current scan includes legacy inheritance children.
+ 
+ 
+  scanning_table
+  oid
+  
+   The table being scanned (differs from relid
+   only when processing partitions or inheritance children).
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks to scan in the current table.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ scanning table
+ 
+   The command is currently scanning the table.
+ 
+
+
+ analyzing sample
+ 
+   ANALYZE is currently extracting statistical data
+   from the sample obtained.
+ 
+
+
+ analyzing sample (extended)
+ 
+   ANALYZE is currently extracting statistical data
+   for extended statistics from the sample obtained.
+ 
+
+   
+   
+  
+ 
+
  
   CLUSTER Progress Reporting
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..e497f59790 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,21 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+SELECT
+S.pid AS pid, S.datid AS datid, D.datname AS datname,
+CAST(S.relid AS oid) AS relid,
+CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+CAST(S.param3 AS oid) AS scanning_table,
+CASE S.param1 WHEN 0 THEN 'initializing'
+  WHEN 1 THEN 'scanning table'
+  WHEN 2 THEN 'analyzing sample'
+  WHEN 3 THEN 'analyzing sample (extended stats)'
+  END AS phase,
+S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+FROM pg_stat_get_progress_info('ANALYZE') AS S
+LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
 SELECT
 S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 6cb54

Re: allow_system_table_mods stuff

2019-06-21 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-06-21 12:28:43 -0400, Tom Lane wrote:
> > Robert Haas  writes:
> > > A related issue is that alter_system_table_mods prohibits both stuff
> > > that's probably not going to cause any big problem and stuff that is
> > > almost guaranteed to make the system permanently unusable - e.g. you
> > > could 'SET STORAGE' on a system catalog column, which is really pretty
> > > innocuous, or you could change the oid column of pg_database to a
> > > varlena type, which is guaranteed to destroy the universe.  Here
> > > again, maybe some operations should be more protected than others, or
> > > maybe the relatively safe things just shouldn't be subject to
> > > allow_system_table_mods at all.
> > 
> > Meh.  It doesn't really seem to me that distinguishing these cases
> > is a productive use of code space or maintenance effort.  Superusers
> > are assumed to know what they're doing, and most especially so if
> > they've hit the big red button.
> 
> I really don't buy this. You need superuser for nearly all CREATE
> EXTENSION invocations, and for a lot of other routine tasks. Making the
> non-routine crazy stuff slightly harder is worthwhile. I don't think we
> can really separate those two into fully separate roles unfortunately,
> because the routine CREATE EXTENSION stuff obviously can be used to
> elevate privs.

I'm not sure what you're intending to respond to here, but I don't think
it's what was being discussed.

The question went something like this- if we decide that setting the
default for relpages made sense in some use-case, should a superuser
have to hit the big red button to do:

ALTER TABLE pg_class SET DEFAULT relpages = 100; 

or should we just allow it?

Tom's opinion, if I followed it correctly, was 'no, that is rare enough
that it just is not worth the extra code to allow that without the big
red button, but deny everything else.'  My opinion was 'if they bring us
a legitimate use-case for such, then, sure, maybe we allow specficially
that without hitting the big red button.'  Robert was suggesting that we
could have a tri-state for a_s_t_m, where you could hit the big red
button only half-way, and certain things would then be allowed.

At least, I think that's about how it went.  None of it was about doing
typical CREATE EXTENSION and similar routine tasks that don't involve
running ALTER TABLE or DML against catalog tables.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: allow_system_table_mods stuff

2019-06-21 Thread Andres Freund
Hi,

On 2019-06-21 12:28:43 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I kinda feel like we should prohibit DML on system catalogs, even by
> > superusers, unless you press the big red button that says "I am
> > definitely sure that I know what I'm doing."
> 
> Keep in mind that DML-on-system-catalogs is unfortunately a really
> standard hack in extension upgrade scripts.  (If memory serves,
> some of our contrib scripts do that, and we've certainly told third
> parties that it's the only way out of some box or other.)  I don't
> think we can just shut it off.  What you seem to be proposing is to
> allow it only after
> 
> SET allow_system_table_mods = on;
> 
> which would be all right except that an extension script containing
> such a command will fail outright in existing releases.  I think we
> need to be friendlier than that to extension authors who are, for the
> most part, trying to work around some deficiency of ours not theirs.

I'm not quite convinced we need to go very far with compatibility here -
pretty much by definition scripts that do this are tied a lot more to
the internals than ones using DDL. But if we want to, we could just -
for now at least - set allow_system_table_mods to a new 'warn' - when
processing extension scripts as superusers.


> > A related issue is that alter_system_table_mods prohibits both stuff
> > that's probably not going to cause any big problem and stuff that is
> > almost guaranteed to make the system permanently unusable - e.g. you
> > could 'SET STORAGE' on a system catalog column, which is really pretty
> > innocuous, or you could change the oid column of pg_database to a
> > varlena type, which is guaranteed to destroy the universe.  Here
> > again, maybe some operations should be more protected than others, or
> > maybe the relatively safe things just shouldn't be subject to
> > allow_system_table_mods at all.
> 
> Meh.  It doesn't really seem to me that distinguishing these cases
> is a productive use of code space or maintenance effort.  Superusers
> are assumed to know what they're doing, and most especially so if
> they've hit the big red button.

I really don't buy this. You need superuser for nearly all CREATE
EXTENSION invocations, and for a lot of other routine tasks. Making the
non-routine crazy stuff slightly harder is worthwhile. I don't think we
can really separate those two into fully separate roles unfortunately,
because the routine CREATE EXTENSION stuff obviously can be used to
elevate privs.

Greetings,

Andres Freund




Re: allow_system_table_mods stuff

2019-06-21 Thread Andres Freund
On 2019-06-21 11:12:38 +0200, Peter Eisentraut wrote:
> After the earlier thread [0] that dealt with ALTER TABLE on system
> catalogs, I took a closer look at the allow_system_table_mods setting.
> I found a few oddities, and it seems there is some room for improvement.

I complained about this recently again, and unfortunately the reaction
wasn't that welcoming:
https://postgr.es/m/20190509145054.byiwa255xvdbfh3a%40alap3.anarazel.de

> Attached are some patches to get the discussion rolling: One patch makes
> allow_system_table_mods settable at run time by superuser

+1 - this seems to have agreement.


> - For the most part, a_s_t_m establishes an additional level of access
> control on top of superuserdom for doing DDL on system catalogs.  That
> seems like a useful definition.
>
> - But enabling a_s_t_m also allows a non-superuser to do DML on system
> catalogs.  That seems like an entirely unrelated and surprising behavior.

Indeed.


> - Some checks are redundant with the pinning concept of the dependency
> system.  For example, you can't drop a system catalog even with a_s_t_m
> on.  That seems useful, of course, but as a result there is a bit of
> dead or useless code around.  (The dependency system is newer than a_s_t_m.)

I'm not fond of deduplicating things around this. This seems like a
separate layers of defense to me.


> - Having a test suite like this seems useful.

+1


> - The behavior that a_s_t_m allows non-superusers to do DML on system
> catalogs should be removed.  (Regular permissions can be used for that.)

+1


> - Dead code or code that is redundant with pinning should be removed.

-1


> Any other thoughts?

* a_s_t_m=off should forbid modifying catalog tables, even for
  superusers.



Greetings,

Andres Freund




Re: SQL/JSON path issues/questions

2019-06-21 Thread Alexander Korotkov
On Wed, Jun 19, 2019 at 10:14 PM Alexander Korotkov
 wrote:
> > While I have no objections to the proposed fixes, I think we can further
> > improve patch 0003 and the text it refers to.
> > In attempt to clarify jsonpath docs and address the concern that ? is
> > hard to trace in the current text, I'd also like to propose patch 0004.
> > Please see both of them attached.
>
> Thank you for your editing.  I'm going to commit them as well.
>
> But I'm going to commit your changes separately from 0003 I've posted
> before.  Because 0003 fixes factual error, while you're proposing set
> of grammar/style fixes.

I made some review of these patches.  My notes are following:

   
-See also  for the aggregate
-function json_agg which aggregates record
-values as JSON, and the aggregate function
-json_object_agg which aggregates pairs of values
-into a JSON object, and their jsonb equivalents,
+See also  for details on
+json_agg function that aggregates record
+values as JSON, json_object_agg function
+that aggregates pairs of values into a JSON object, and their
jsonb equivalents,
 jsonb_agg and jsonb_object_agg.
   

This part is not directly related to jsonpath, and it has been there
for a long time.  I'd like some native english speaker to review this
change before committing this.


-Expression inside subscript may consititue an integer,
-numeric expression or any other jsonpath expression
-returning single numeric value.  The last keyword
-can be used in the expression denoting the last subscript in an array.
-That's helpful for handling arrays of unknown length.
+The specified index can be an integer,
+as well as a numeric or jsonpath expression that
+returns a single integer value. Zero index corresponds to the first
+array element. To access the last element in an array, you can use
+the last keyword, which is useful for handling
+arrays of unknown length.


I think this part requires more work.  Let's see what cases do we have
with examples:

1) Integer: '$.ar[1]'
2) Numeric: '$.ar[1.5]' (converted to integer)
3) Some numeric expression: '$.ar[last - 1]'
4) Arbitrary jsonpath expression: '$.ar[$.ar2.size() + $.num - 2]'

In principle, it not necessary to divide 3 and 4, or divide 1 and 2.
Or we may don't describe cases at all, but just say it's a jsonpath
expression returning numeric, which is converted to integer.

Also, note that we do not necessary *access* last array element with
"last" keyword.  "last" keyword denotes index of last element in
expression.  But completely different element might be actually
accessed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I haven't been paying too close attention to this thread, but isn't
> >> that exactly what it does now and always has?  guc.c, at least, certainly
> >> is going to interpret duplicate entries that way.
> 
> > The issue isn't with reading them and interpreting them, it's what
> > happens when you run ALTER SYSTEM and it goes and modifies the file.
> > Presently, it basically operates on the first entry it finds when
> > performing a SET or a RESET.
> 
> Ah, got it.  So it seems like the correct behavior might be for
> ALTER SYSTEM to
> (a) run through the whole file and remove any conflicting lines;
> (b) append new setting at the end.

Sure- and every other tool that modifies that file should know that
*that* is how you do it, and therefore, if everyone is doing it right,
you don't ever end up with duplicates in the file.  If you do, someone's
doing it wrong, and we should issue a warning.

That's more-or-less the conclusion on the other thread, as I understood
it.

> If you had some fancy setup with comments associated with entries,
> you might not be pleased with that.  But I can't muster a lot of
> sympathy for tools putting comments in postgresql.auto.conf anyway;
> it's not intended to be a human-readable file.

If we were to *keep* the duplicates, then I could see value in including
information about prior configuration entries (I mean, that's what a lot
of external tools do with our postgresql.conf file- put it into git or
some other configuration management tool...).  If we aren't keeping the
dups, then I agree that there doesn't seem much point.

> If anybody does complain, my first reaction would be to make ALTER
> SYSTEM strip all comment lines too.

Uh, I believe it already does?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: allow_system_table_mods stuff

2019-06-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > I kinda feel like we should prohibit DML on system catalogs, even by
> > superusers, unless you press the big red button that says "I am
> > definitely sure that I know what I'm doing."
> 
> Keep in mind that DML-on-system-catalogs is unfortunately a really
> standard hack in extension upgrade scripts.  (If memory serves,
> some of our contrib scripts do that, and we've certainly told third
> parties that it's the only way out of some box or other.)  I don't
> think we can just shut it off.  What you seem to be proposing is to
> allow it only after
> 
> SET allow_system_table_mods = on;

That's basically what my feeling is, yes.

> which would be all right except that an extension script containing
> such a command will fail outright in existing releases.  I think we
> need to be friendlier than that to extension authors who are, for the
> most part, trying to work around some deficiency of ours not theirs.

As with other cases where someone needs to do DML against the catalog
for some reason or another- we should fix that.  If there's example
cases, great!  Let's look at those and come up with a proper solution.

Other options include- letting an extension set that GUC (seems likely
that any case where this is needed is a case where the extension is
installing C functions and therefore is being run by a superuser
anyway...), or implicitly setting that GUC when we're running an
extension's script (urrggg...  I don't care for that one bit, but I
like it better than letting any superuser who wishes UPDATE random bits
in the catalog).

> I'm not saying that DML-off-by-default is a bad goal to work toward;
> I'm just saying "mind the collateral damage".

Sure, makes sense.

> > A related issue is that alter_system_table_mods prohibits both stuff
> > that's probably not going to cause any big problem and stuff that is
> > almost guaranteed to make the system permanently unusable - e.g. you
> > could 'SET STORAGE' on a system catalog column, which is really pretty
> > innocuous, or you could change the oid column of pg_database to a
> > varlena type, which is guaranteed to destroy the universe.  Here
> > again, maybe some operations should be more protected than others, or
> > maybe the relatively safe things just shouldn't be subject to
> > allow_system_table_mods at all.
> 
> Meh.  It doesn't really seem to me that distinguishing these cases
> is a productive use of code space or maintenance effort.  Superusers
> are assumed to know what they're doing, and most especially so if
> they've hit the big red button.

The direction I took the above was that we should actually be thinking
about if there are acceptable cases to be running DDL against the
catalog and, if so, specifically allow those.  I'm not convinced at the
moment that any such exist and therefore I'd rather have it denied
(unless you push the big red button) and then tell people to show us
their use case and then we can decide if it's an 'ok' thing to allow, or
what.

I'd really like to stop the cases like stackoverflow articles that
describe how to "remove" an enum value by simply modifying the catalog,
or at least make them have to add a "well, push this big red button
first that the PG people tell you never to push, and then.." to the
start.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I haven't been paying too close attention to this thread, but isn't
>> that exactly what it does now and always has?  guc.c, at least, certainly
>> is going to interpret duplicate entries that way.

> The issue isn't with reading them and interpreting them, it's what
> happens when you run ALTER SYSTEM and it goes and modifies the file.
> Presently, it basically operates on the first entry it finds when
> performing a SET or a RESET.

Ah, got it.  So it seems like the correct behavior might be for
ALTER SYSTEM to
(a) run through the whole file and remove any conflicting lines;
(b) append new setting at the end.

If you had some fancy setup with comments associated with entries,
you might not be pleased with that.  But I can't muster a lot of
sympathy for tools putting comments in postgresql.auto.conf anyway;
it's not intended to be a human-readable file.

If anybody does complain, my first reaction would be to make ALTER
SYSTEM strip all comment lines too.

regards, tom lane




Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > To me, forcing every tools author to use postgresql.conf parsing tools
> > rather than just appending to the file is a needless burden on tool
> > authors.  I'd vote for just having ALTER SYSTEM silently drop all but
> > the last of duplicated entries.
> 
> I haven't been paying too close attention to this thread, but isn't
> that exactly what it does now and always has?  guc.c, at least, certainly
> is going to interpret duplicate entries that way.

The issue isn't with reading them and interpreting them, it's what
happens when you run ALTER SYSTEM and it goes and modifies the file.
Presently, it basically operates on the first entry it finds when
performing a SET or a RESET.

Which also means that you can issue SET's to your heart's content, and
if there's a duplicate for that GUC, you'll never actually change what
is interpreted.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Tom Lane
Robert Haas  writes:
> To me, forcing every tools author to use postgresql.conf parsing tools
> rather than just appending to the file is a needless burden on tool
> authors.  I'd vote for just having ALTER SYSTEM silently drop all but
> the last of duplicated entries.

I haven't been paying too close attention to this thread, but isn't
that exactly what it does now and always has?  guc.c, at least, certainly
is going to interpret duplicate entries that way.

regards, tom lane




Re: allow_system_table_mods stuff

2019-06-21 Thread Tom Lane
Robert Haas  writes:
> I kinda feel like we should prohibit DML on system catalogs, even by
> superusers, unless you press the big red button that says "I am
> definitely sure that I know what I'm doing."

Keep in mind that DML-on-system-catalogs is unfortunately a really
standard hack in extension upgrade scripts.  (If memory serves,
some of our contrib scripts do that, and we've certainly told third
parties that it's the only way out of some box or other.)  I don't
think we can just shut it off.  What you seem to be proposing is to
allow it only after

SET allow_system_table_mods = on;

which would be all right except that an extension script containing
such a command will fail outright in existing releases.  I think we
need to be friendlier than that to extension authors who are, for the
most part, trying to work around some deficiency of ours not theirs.

I'm not saying that DML-off-by-default is a bad goal to work toward;
I'm just saying "mind the collateral damage".

> A related issue is that alter_system_table_mods prohibits both stuff
> that's probably not going to cause any big problem and stuff that is
> almost guaranteed to make the system permanently unusable - e.g. you
> could 'SET STORAGE' on a system catalog column, which is really pretty
> innocuous, or you could change the oid column of pg_database to a
> varlena type, which is guaranteed to destroy the universe.  Here
> again, maybe some operations should be more protected than others, or
> maybe the relatively safe things just shouldn't be subject to
> allow_system_table_mods at all.

Meh.  It doesn't really seem to me that distinguishing these cases
is a productive use of code space or maintenance effort.  Superusers
are assumed to know what they're doing, and most especially so if
they've hit the big red button.

regards, tom lane




Re: Minimal logical decoding on standbys

2019-06-21 Thread Amit Khandekar
On Thu, 20 Jun 2019 at 00:31, Andres Freund  wrote:
> On 2019-06-12 17:30:02 +0530, Amit Khandekar wrote:
> > In the attached v6 version of the patch, I did the above. That is, I
> > used XLogFindNextRecord() to bump up the restart_lsn of the slot to
> > the first valid record. But since XLogReaderState is not available in
> > ReplicationSlotReserveWal(), I did this in
> > DecodingContextFindStartpoint(). And then updated the slot restart_lsn
> > with this corrected position.
>
> > Since XLogFindNextRecord() is currently disabled using #if 0, removed
> > this directive.
>
> Well, ifdef FRONTEND. I don't think that's a problem. It's a bit
> overkill here, because I think we know the address has to be on a record
> boundary (rather than being in the middle of a page spanning WAL
> record). So we could just add add the size of the header manually
> - but I think that's not worth doing.
>
>
> > > Or else, do you think we can just increment the record pointer by
> > > doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
> > > SizeOfXLogShortPHD() ?
> >
> > I found out that we can't do this, because we don't know whether the
> > xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
> > our context, it is SizeOfXLogLongPHD. So we indeed need the
> > XLogReaderState handle.
>
> Well, we can determine whether a long or a short header is going to be
> used, as that's solely dependent on the LSN:
>
> /*
>  * If first page of an XLOG segment file, make it a long 
> header.
>  */
> if ((XLogSegmentOffset(NewPage->xlp_pageaddr, 
> wal_segment_size)) == 0)
> {
> XLogLongPageHeader NewLongPage = (XLogLongPageHeader) 
> NewPage;
>
> NewLongPage->xlp_sysid = 
> ControlFile->system_identifier;
> NewLongPage->xlp_seg_size = wal_segment_size;
> NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
> NewPage->xlp_info |= XLP_LONG_HEADER;
> }
>
> but I don't think that's worth it.

Ok, so what you are saying is : In case of ReplayRecPtr, it is always
possible to know whether it is pointing at a long header or short
header, just by looking at its value. And then we just increment it by
the header size after knowing the header size. Why do you think it is
no worth it ? In fact, I thought we *have* to increment it to set it
to the next record. Didn't understand what other option we have.

>
>
> > > Do you think that we can solve this using some other approach ? I am
> > > not sure whether it's only the initial conditions that cause
> > > lastReplayedEndRecPtr value to *not* point to a valid record, or is it
> > > just a coincidence and that lastReplayedEndRecPtr can also have such a
> > > value any time afterwards.
>
> It's always possible. All that means is that the last record filled the
> entire last WAL page.

Ok that means we *have* to bump the pointer ahead.

>
>
> > > If it's only possible initially, we can
> > > just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
> > > lastReplayedEndRecPtr is invalid.
>
> I don't think so? The redo pointer will point to something *much*
> earlier, where we'll not yet have done all the necessary conflict
> handling during recovery? So we'd not necessarily notice that a slot
> is not actually usable for decoding.
>
> We could instead just handle that by starting decoding at the redo
> pointer, and just ignore all WAL records until they're after
> lastReplayedEndRecPtr, but that has no advantages, and will read a lot
> more WAL.

Yeah I agree : just doing this for initial case is a bad idea.

>
>
>
>
> >  static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
> > @@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, 
> > TransactionId latestRemovedX
> >*/
> >
> >   /* XLOG stuff */
> > + xlrec_reuse.onCatalogTable = 
> > get_rel_logical_catalog(rel->rd_index->indrelid);
> >   xlrec_reuse.node = rel->rd_node;
> >   xlrec_reuse.block = blkno;
> >   xlrec_reuse.latestRemovedXid = latestRemovedXid;
> > @@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
> >   XLogRecPtr  recptr;
> >   xl_btree_delete xlrec_delete;
> >
> > + xlrec_delete.onCatalogTable = 
> > get_rel_logical_catalog(rel->rd_index->indrelid);
> >   xlrec_delete.latestRemovedXid = latestRemovedXid;
> >   xlrec_delete.nitems = nitems;
>
> Can we instead pass the heap rel down to here? I think there's only one
> caller, and it has the heap relation available these days (it didn't at
> the time of the prototype, possibly).  There's a few other users of
> get_rel_logical_catalog() where that might be harder, but it's easy
> here.

For _bt_log_reuse_page(), it's only caller is _bt_getbuf() which does
not have heapRel parameter. Let me know which caller you were
referri

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jun 17, 2019 at 10:50 AM Ian Barwick
>  wrote:
> > In Pg12, the code in pg_basebackup implies the correct thing to do is 
> > append to .auto.conf,
> > but as demonstrated that can cause problems with duplicate entries.
> 
> Yeah.
> 
> To me, forcing every tools author to use postgresql.conf parsing tools
> rather than just appending to the file is a needless burden on tool
> authors.  I'd vote for just having ALTER SYSTEM silently drop all but
> the last of duplicated entries.
> 
> It sounds like I might be in the minority, but I feel like the
> reactions which suggest that this is somehow heresy are highly
> overdone. Given that the very first time somebody wanted to do
> something like this in core, they picked this approach, I think we can
> assume that it is a natural approach which other people will also
> attempt. There doesn't seem to be any good reason for it not to Just
> Work.

That's not quite accurate, given that it isn't how the ALTER SYSTEM call
itself works, and clearly isn't how the authors of that feature expected
things to work or they would have actually made it work.  They didn't,
and it doesn't actually work.

The notion that pg_basebackup was correct in this, when it wasn't tested
at all, evidently, even after the concern was raised, and ALTER SYSTEM
was wrong, even though it worked just fine before some later patch
started making changes to the file, based on the idea that it's the
"natural approach" doesn't make sense to me.

If the change to pg_basebackup had included a change to ALTER SYSTEM to
make it work the *same* way that pg_basebackup now does, or at least to
work with the changes that pg_basebackup were making, then maybe
everything would have been fine.

That is to say, if your recommendation is to change everything that
modifies postgresql.auto.conf to *always* append (and maybe even include
a comment about when, and who, made the change..), and to make
everything work correctly with that, then that seems like it might be a
reasonable approach (though dealing with RESETs might be a little ugly..
haven't fully thought about that).

I still don't feel that having ALTER SYSTEM just remove duplicates is a
good idea and I do think it'll lead to confusion.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: allow_system_table_mods stuff

2019-06-21 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jun 21, 2019 at 5:12 AM Peter Eisentraut
>  wrote:
> > Any other thoughts?
> 
> I kinda feel like we should prohibit DML on system catalogs, even by
> superusers, unless you press the big red button that says "I am
> definitely sure that I know what I'm doing." Linking that with
> allow_system_table_mods is some way seems natural, but I'm not totally
> sure it's the right thing to do.  I guess we could have
> alter_table_system_mods={no,yes,yesyesyes}, the former allowing DML
> and not-too-scary things and the latter allowing anything at all.

I agree that we should be strongly discouraging even superusers from
doing DML or DDL on system catalogs, and making them jump through hoops
to make it happen at all.

> A related issue is that alter_system_table_mods prohibits both stuff
> that's probably not going to cause any big problem and stuff that is
> almost guaranteed to make the system permanently unusable - e.g. you
> could 'SET STORAGE' on a system catalog column, which is really pretty
> innocuous, or you could change the oid column of pg_database to a
> varlena type, which is guaranteed to destroy the universe.  Here
> again, maybe some operations should be more protected than others, or
> maybe the relatively safe things just shouldn't be subject to
> allow_system_table_mods at all.

If there are things which are through proper grammar (ALTER TABLE or
such) and which will actually usefully work when done against a system
catalog table (eg: GRANT), then I'm all for just allowing that, provided
the regular security checks are done.  I don't think we should ever be
allowed DML though, or any DDL which we know will break the system,
without making them go through hoops.  Personally, I'd rather disallow
all DDL on system catalogs and then explicitly add support for specific
DDL when someone complains and has done a sufficient review to show that
allowing that DDL is a good thing and will actually work as intended.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-21 Thread Robert Haas
On Mon, Jun 17, 2019 at 10:50 AM Ian Barwick
 wrote:
> In Pg12, the code in pg_basebackup implies the correct thing to do is append 
> to .auto.conf,
> but as demonstrated that can cause problems with duplicate entries.

Yeah.

To me, forcing every tools author to use postgresql.conf parsing tools
rather than just appending to the file is a needless burden on tool
authors.  I'd vote for just having ALTER SYSTEM silently drop all but
the last of duplicated entries.

It sounds like I might be in the minority, but I feel like the
reactions which suggest that this is somehow heresy are highly
overdone. Given that the very first time somebody wanted to do
something like this in core, they picked this approach, I think we can
assume that it is a natural approach which other people will also
attempt. There doesn't seem to be any good reason for it not to Just
Work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: allow_system_table_mods stuff

2019-06-21 Thread Robert Haas
On Fri, Jun 21, 2019 at 5:12 AM Peter Eisentraut
 wrote:
> Attached are some patches to get the discussion rolling: One patch makes
> allow_system_table_mods settable at run time by superuser, the second
> one is a test suite that documents the current behavior that I gathered
> after analyzing the source code, the third one removes some code that
> was found useless by the tests.  (The first patch might be useful on its
> own, but right now it's just to facilitate the test suite.)

Sounds generally sensible (but I didn't read the code).  I
particularly like the first idea.

> Any other thoughts?

I kinda feel like we should prohibit DML on system catalogs, even by
superusers, unless you press the big red button that says "I am
definitely sure that I know what I'm doing." Linking that with
allow_system_table_mods is some way seems natural, but I'm not totally
sure it's the right thing to do.  I guess we could have
alter_table_system_mods={no,yes,yesyesyes}, the former allowing DML
and not-too-scary things and the latter allowing anything at all.

A related issue is that alter_system_table_mods prohibits both stuff
that's probably not going to cause any big problem and stuff that is
almost guaranteed to make the system permanently unusable - e.g. you
could 'SET STORAGE' on a system catalog column, which is really pretty
innocuous, or you could change the oid column of pg_database to a
varlena type, which is guaranteed to destroy the universe.  Here
again, maybe some operations should be more protected than others, or
maybe the relatively safe things just shouldn't be subject to
allow_system_table_mods at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: File descriptors inherited by restore_command

2019-06-21 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> David Steele  writes:
> > On 6/21/19 9:45 AM, Tom Lane wrote:
> >> +1 for using O_CLOEXEC on machines that have it.  I don't think I want to
> >> jump through hoops for machines that don't have it --- POSIX has required
> >> it for some time, so there should be few machines in that category.
> 
> > Another possible issue is that if we allow a child process to inherit
> > all these fds it might accidentally write to them, which would be bad.
> > I know the child process can go and maliciously open and trash files if
> > it wants, but it doesn't seem like we should allow it to happen
> > unintentionally.
> 
> True.  But I don't want to think of this as a security issue, because
> then it becomes a security bug to forget O_CLOEXEC anywhere in the
> backend, and that is a standard we cannot meet.  (Even if we could
> hold to it for the core code, stuff like libperl and libpython can't
> be relied on to play ball.)  In practice, as long as we use O_CLOEXEC
> for files opened by fd.c, that would eliminate the actual too-many-fds
> hazard.  I don't object to desultorily looking around for other places
> where we might want to add it, but personally I'd be satisfied with a
> patch that CLOEXEC-ifies fd.c.

Agreed, it's not a security issue, and also agreed that we should
probably get it done with fd.c right off, and then if someone wants to
think about other places where it might be good to do then more power to
them and it seems like we'd be happy to accept such patches.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: File descriptors inherited by restore_command

2019-06-21 Thread Tom Lane
I wrote:
> In practice, as long as we use O_CLOEXEC
> for files opened by fd.c, that would eliminate the actual too-many-fds
> hazard.  I don't object to desultorily looking around for other places
> where we might want to add it, but personally I'd be satisfied with a
> patch that CLOEXEC-ifies fd.c.

Actually, even that much coverage might be exciting.  Be sure to test
patch with EXEC_BACKEND to see if it causes zapping of any files the
postmaster needs to pass down to backends.

regards, tom lane




Re: File descriptors inherited by restore_command

2019-06-21 Thread Tom Lane
David Steele  writes:
> On 6/21/19 9:45 AM, Tom Lane wrote:
>> +1 for using O_CLOEXEC on machines that have it.  I don't think I want to
>> jump through hoops for machines that don't have it --- POSIX has required
>> it for some time, so there should be few machines in that category.

> Another possible issue is that if we allow a child process to inherit
> all these fds it might accidentally write to them, which would be bad.
> I know the child process can go and maliciously open and trash files if
> it wants, but it doesn't seem like we should allow it to happen
> unintentionally.

True.  But I don't want to think of this as a security issue, because
then it becomes a security bug to forget O_CLOEXEC anywhere in the
backend, and that is a standard we cannot meet.  (Even if we could
hold to it for the core code, stuff like libperl and libpython can't
be relied on to play ball.)  In practice, as long as we use O_CLOEXEC
for files opened by fd.c, that would eliminate the actual too-many-fds
hazard.  I don't object to desultorily looking around for other places
where we might want to add it, but personally I'd be satisfied with a
patch that CLOEXEC-ifies fd.c.

regards, tom lane




Re: File descriptors inherited by restore_command

2019-06-21 Thread David Steele
On 6/21/19 9:45 AM, Tom Lane wrote:
> David Steele  writes:
>> While investigating "Too many open files" errors reported in our
>> parallel restore_command I noticed that the restore_command can inherit
>> quite a lot of fds from the recovery process.  This limits the number of
>> fds available in the restore_command depending on the setting of system
>> nofile and Postgres max_files_per_process.
> 
> Hm.  Presumably you could hit the same issue with things like COPY FROM
> PROGRAM.  And the only reason the archiver doesn't hit it is it never
> opens many files to begin with.

Yes.  The archiver process is fine because it has ~8 fds open.

>> I was wondering if we should consider closing these fds before calling
>> restore_command?  It seems like we could do this by forking first or by
>> setting FD_CLOEXEC using fcntl() or O_CLOEXEC on open() where available.
> 
> +1 for using O_CLOEXEC on machines that have it.  I don't think I want to
> jump through hoops for machines that don't have it --- POSIX has required
> it for some time, so there should be few machines in that category.

Another possible issue is that if we allow a child process to inherit
all these fds it might accidentally write to them, which would be bad.
I know the child process can go and maliciously open and trash files if
it wants, but it doesn't seem like we should allow it to happen
unintentionally.

Regards,
-- 
-David
da...@pgmasters.net




Re: using explicit_bzero

2019-06-21 Thread Tom Lane
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes:
> Also, on Linux it requires libbsd: https://libbsd.freedesktop.org/
> (which seems to be down, but
> https://packages.debian.org/buster/libbsd-dev has a list of the
> functions it provides).

Ugh, that could be a bit nasty.  I might be misremembering, but
my hindbrain is running for cover and yelling something about how
importing libbsd changes signal semantics.  Our git log has a few
scary references to other bad side-effects of -lbsd (cf 55c235b26,
1337751e5, a27fafecc).  On the whole, I'm not excited about pulling
in a library whose entire purpose is to mess with POSIX semantics.

regards, tom lane




Google Season of Docs

2019-06-21 Thread Mahesh S
Hello,
I am Mahesh S Nair from India. I am a GSoC 2018 student at KDE. I am very
much interested in working with the organization for Google Season of Docs
2019. I think I am capable for, an open-source technical writing work as I
have done GSoC and was a GCI mentor also.
I am actually new to this community, but I have used PostgreSQL quite often
in for my academics.

 I have started to read more on the topics too. I wish to draft a
proposal(1st draft) for GSoD as soon as possible.

Please let me know about the opportunities.

Thank You

-- 
Thank you from
Mahesh S Nair
Amrita University 


Re: using explicit_bzero

2019-06-21 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Peter Eisentraut  writes:
>> +#ifndef HAVE_EXPLICIT_BZERO
>> +#define explicit_bzero(b, len) bzero(b, len)
>> +#endif
>
> This presumes that every platform has bzero, which is unsafe (POSIX
> doesn't specify it) and is an assumption we kicked to the curb a dozen
> years ago (067a5cdb3).  Please use memset() for the substitute instead.
>
> Also, I'm a bit suspicious of using AC_CHECK_FUNCS for this; that
> generally Doesn't Work for anything that's not a vanilla out-of-line
> function.  Are we worried about people implementing this as a macro,
> compiler built-in, etc?

Also, on Linux it requires libbsd: https://libbsd.freedesktop.org/
(which seems to be down, but
https://packages.debian.org/buster/libbsd-dev has a list of the
functions it provides).

- ilmari
-- 
"A disappointingly low fraction of the human race is,
 at any given time, on fire." - Stig Sandbeck Mathisen




Re: File descriptors inherited by restore_command

2019-06-21 Thread Tom Lane
David Steele  writes:
> While investigating "Too many open files" errors reported in our
> parallel restore_command I noticed that the restore_command can inherit
> quite a lot of fds from the recovery process.  This limits the number of
> fds available in the restore_command depending on the setting of system
> nofile and Postgres max_files_per_process.

Hm.  Presumably you could hit the same issue with things like COPY FROM
PROGRAM.  And the only reason the archiver doesn't hit it is it never
opens many files to begin with.

> I was wondering if we should consider closing these fds before calling
> restore_command?  It seems like we could do this by forking first or by
> setting FD_CLOEXEC using fcntl() or O_CLOEXEC on open() where available.

+1 for using O_CLOEXEC on machines that have it.  I don't think I want to
jump through hoops for machines that don't have it --- POSIX has required
it for some time, so there should be few machines in that category.

regards, tom lane




File descriptors inherited by restore_command

2019-06-21 Thread David Steele
Hackers,

While investigating "Too many open files" errors reported in our
parallel restore_command I noticed that the restore_command can inherit
quite a lot of fds from the recovery process.  This limits the number of
fds available in the restore_command depending on the setting of system
nofile and Postgres max_files_per_process.

I was wondering if we should consider closing these fds before calling
restore_command?  It seems like we could do this by forking first or by
setting FD_CLOEXEC using fcntl() or O_CLOEXEC on open() where available.

Thoughts on this?  Is this something we want to change or should I just
recommend that users set nofile and max_files_per_process appropriately?

Regards,
-- 
-David
da...@pgmasters.net




Re: using explicit_bzero

2019-06-21 Thread Tom Lane
Peter Eisentraut  writes:
> +#ifndef HAVE_EXPLICIT_BZERO
> +#define explicit_bzero(b, len) bzero(b, len)
> +#endif

This presumes that every platform has bzero, which is unsafe (POSIX
doesn't specify it) and is an assumption we kicked to the curb a dozen
years ago (067a5cdb3).  Please use memset() for the substitute instead.

Also, I'm a bit suspicious of using AC_CHECK_FUNCS for this; that
generally Doesn't Work for anything that's not a vanilla out-of-line
function.  Are we worried about people implementing this as a macro,
compiler built-in, etc?

regards, tom lane




Re: ldapbindpasswdfile

2019-06-21 Thread Stephen Frost
Greetings,

* Thomas Munro (thomas.mu...@gmail.com) wrote:
> I also know that a motivated user could also use GSSAPI instead of
> LDAP.  Do you think we should update the manual to say so, perhaps in
> a "tip" box on the LDAP auth page?

Hrm, not sure how I missed this before, but, yes, I'm all for adding a
'tip' box on the LDAP auth page which recommends use of GSSAPI when
available (such as when operating in an Active Directory
environment...).  Note that, technically, you can run LDAP without using
Active Directory and without running any kind of KDC, so we can't just
blanket say "use GSSAPI" because there exists use-cases where that isn't
an option.

Not that I've ever actually *encountered* such an environment, but
people have assured me that they do, in fact, exist, and that there are
users of PG LDAP auth with such a setup who would be upset to see
support for it removed.

Anyhow, yes, a 'tip' would be great to add.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: O(N^2) when building multi-column MCV lists

2019-06-21 Thread Tomas Vondra

Hi,

Attached is a WIP/PoC fix addressing the O(N^2) behavior in ANALYZE with
high statistic target values. It needs more work, but it's good enough to
show some measurements.

For benchmark, I've created a simple 2-column table, with MCV list on
those two columns:

 CREATE TABLE t (a int, b int);
 CREATE STATISTICS s (mcv) ON a,b FROM t;

and then loaded data sets with different numbers of random combinations,
determined by number of values in each column. For example with 10 values
in a column, you get ~100 combinations.

 INSERT INTO t
 SELECT 10*random(), 10*random() FROM generate_series(1,3e6);

The 3M rows is picked because that's the sample size with target 1.

The results with different statistic targets look like this:

1) master

   values100   10005000   1
   
   1010358624193041
  10011678947788934
 10001166903162  499748

2) patched

   values100   10005000   1
   
   1011360624603716
  10014371133715231
 100015699438366002

3) comparison (patched / master)

   values100   10005000   1
   
   10   110%   103%102%122%
  100   123%90% 71% 59%
 1000   134%   144%121%  1%


So clearly, the issue for large statistic targets is resolved (duration
drops from 500s to just 6s), but there is measurable regression for the
other cases. That needs more investigation & fix before commit.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 5fe61ea0a4..6dc4ed37d8 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -78,6 +78,9 @@ static MultiSortSupport build_mss(VacAttrStats **stats, int 
numattrs);
 static SortItem *build_distinct_groups(int numrows, SortItem *items,
   
MultiSortSupport mss, int *ndistinct);
 
+static SortItem **compute_column_counts(SortItem *groups, int ngroups,
+   
MultiSortSupport mss, int *ncounts);
+
 static int count_distinct_groups(int numrows, SortItem *items,
  
MultiSortSupport mss);
 
@@ -172,6 +175,8 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset 
*attrs,
SortItem   *groups;
MCVList*mcvlist = NULL;
MultiSortSupport mss;
+   SortItem  **counts;
+   int*ncounts;
 
attnums = build_attnums_array(attrs, &numattrs);
 
@@ -188,6 +193,10 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset 
*attrs,
/* transform the sorted rows into groups (sorted by frequency) */
groups = build_distinct_groups(nitems, items, mss, &ngroups);
 
+   /* compute frequencies for values in each column */
+   ncounts = (int *) palloc0(sizeof(int) * numattrs);
+   counts = compute_column_counts(groups, ngroups, mss, ncounts);
+
/*
 * Maximum number of MCV items to store, based on the attribute with the
 * largest stats target (and the number of groups we have available).
@@ -242,6 +251,16 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset 
*attrs,
if (nitems > 0)
{
int j;
+   SortItemkey;
+   MultiSortSupporttmp;
+
+   /* used to search values */
+   tmp = (MultiSortSupport) palloc(offsetof(MultiSortSupportData, 
ssup)
+   
+ sizeof(SortSupportData));
+
+   /* space for search key */
+   key.values = palloc(sizeof(Datum));
+   key.isnull = palloc(sizeof(bool));
 
/*
 * Allocate the MCV list structure, set the global parameters.
@@ -281,16 +300,21 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset 
*attrs,
item->base_frequency = 1.0;
for (j = 0; j < numattrs; j++)
{
-   int count = 0;
-   int k;
+   SortItem   *result;
 
-   for (k = 0; k < ngroups; k++)
-   {
-   if (multi_sort_comp

allow_system_table_mods stuff

2019-06-21 Thread Peter Eisentraut
After the earlier thread [0] that dealt with ALTER TABLE on system
catalogs, I took a closer look at the allow_system_table_mods setting.
I found a few oddities, and it seems there is some room for improvement.

Attached are some patches to get the discussion rolling: One patch makes
allow_system_table_mods settable at run time by superuser, the second
one is a test suite that documents the current behavior that I gathered
after analyzing the source code, the third one removes some code that
was found useless by the tests.  (The first patch might be useful on its
own, but right now it's just to facilitate the test suite.)

Some observations:

- For the most part, a_s_t_m establishes an additional level of access
control on top of superuserdom for doing DDL on system catalogs.  That
seems like a useful definition.

- But enabling a_s_t_m also allows a non-superuser to do DML on system
catalogs.  That seems like an entirely unrelated and surprising behavior.

- Some checks are redundant with the pinning concept of the dependency
system.  For example, you can't drop a system catalog even with a_s_t_m
on.  That seems useful, of course, but as a result there is a bit of
dead or useless code around.  (The dependency system is newer than a_s_t_m.)

- The source code comments indicate that SET STATISTICS on system
catalogs is supposed to be allowed without a_s_t_m, but it actually
doesn't work.

Proposals and discussion points:

- Having a test suite like this seems useful.

- The behavior that a_s_t_m allows non-superusers to do DML on system
catalogs should be removed.  (Regular permissions can be used for that.)

- Things that are useful in normal use, for example SET STATISTICS, some
or all reloptions, should always be allowed (subject to other access
control).

- There is currently no support in pg_dump to preserve any of those
changes.  Maybe that's not a big problem.

- Dead code or code that is redundant with pinning should be removed.

Any other thoughts?


[0]:
https://www.postgresql.org/message-id/flat/e49f825b-fb25-0bc8-8afc-d5ad895c7975%402ndquadrant.com

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d9143b1ca4d00ac90b1cd8cd6eefba86cb4685b6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 21 Jun 2019 10:24:03 +0200
Subject: [PATCH v1 1/3] Change allow_system_table_mods to SUSET

---
 doc/src/sgml/config.sgml | 2 +-
 src/backend/utils/misc/guc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..de19519b20 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9298,7 +9298,7 @@ Developer Options

 Allows modification of the structure of system tables.
 This is used by initdb.
-This parameter can only be set at server start.
+Only superusers can change this setting.

   
  
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1208eb9a68..025dcc1d87 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1772,7 +1772,7 @@ static struct config_bool ConfigureNamesBool[] =
},
 
{
-   {"allow_system_table_mods", PGC_POSTMASTER, DEVELOPER_OPTIONS,
+   {"allow_system_table_mods", PGC_SUSET, DEVELOPER_OPTIONS,
gettext_noop("Allows modifications of the structure of 
system tables."),
NULL,
GUC_NOT_IN_SAMPLE
-- 
2.22.0

From 97515b52944b663c068a8dee763e41855739fd56 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 21 Jun 2019 10:24:03 +0200
Subject: [PATCH v1 2/3] Add tests for allow_system_table_mods

---
 .../regress/expected/alter_system_table.out   | 147 
 src/test/regress/parallel_schedule|   3 +
 src/test/regress/sql/alter_system_table.sql   | 165 ++
 3 files changed, 315 insertions(+)
 create mode 100644 src/test/regress/expected/alter_system_table.out
 create mode 100644 src/test/regress/sql/alter_system_table.sql

diff --git a/src/test/regress/expected/alter_system_table.out 
b/src/test/regress/expected/alter_system_table.out
new file mode 100644
index 00..2550e75b2c
--- /dev/null
+++ b/src/test/regress/expected/alter_system_table.out
@@ -0,0 +1,147 @@
+CREATE USER regress_user_ast;
+SET allow_system_table_mods = off;
+-- create new table in pg_catalog
+CREATE TABLE pg_catalog.test (a int);
+ERROR:  permission denied to create "pg_catalog.test"
+DETAIL:  System catalog modifications are currently disallowed.
+-- anyarray column
+CREATE TABLE t1x (a int, b anyarray);
+ERROR:  column "b" has pseudo-type anyarray
+-- index on system catalog
+CREATE INDEX ON pg_class (relnamespace);
+ERROR:  permission denied: "pg_class" is a system catalog
+-- index on system catalog
+ALTER TABLE pg_namespace ADD UNIQUE USING INDEX pg_nam

Re: Choosing values for multivariate MCV lists

2019-06-21 Thread Dean Rasheed
On Thu, 20 Jun 2019 at 23:35, Tomas Vondra  wrote:
>
> On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote:
>
> >I'm not sure it's easy to justify ordering by Abs(freq-base_freq)/freq
> >though, because that would seem likely to put too much weight on the
> >least commonly occurring values.
>
> But would that be an issue, or a good thing? I mean, as long as the item
> is above mincount, we take the counts as reliable. As I explained, my
> motivation for proposing that was that both
>
>... (cost=... rows=1 ...) (actual=... rows=101 ...)
>
> and
>
>... (cost=... rows=100 ...) (actual=... rows=200 ...)
>
> have exactly the same Abs(freq - base_freq), but I think we both agree
> that the first misestimate is much more dangerous, because it's off by six
> orders of magnitude.
>

Hmm, that's a good example. That definitely suggests that we should be
trying to minimise the relative error, but also perhaps that what we
should be looking at is actually just the ratio freq / base_freq,
rather than their difference.

Regards,
Dean




Re: Disconnect from SPI manager on error

2019-06-21 Thread RekGRpth
>It is not plpgsql's job to clean up after other backend subsystems
during a transaction abort.
But plpgsql do clean up on success! I suggest only do cleanup and on
exception.


чт, 20 июн. 2019 г. в 20:33, Tom Lane :

> RekGRpth  writes:
> > A patch fixing this bug
> >
> https://www.postgresql.org/message-id/flat/15738-21723084f3009ceb%40postgresql.org
>
> I do not think this code change is necessary or appropriate.
> It is not plpgsql's job to clean up after other backend subsystems
> during a transaction abort.  Maybe if plpgsql were the only thing
> that invokes spi.c, it would be sane to factorize the responsibility
> this way --- but of course it is not.
>
> The complaint in bug #15738 is 100% bogus, which is probably why
> it was roundly ignored.  The quoted C code is just plain wrong
> about how to handle errors inside the backend.  In particular,
> SPI_rollback is not even approximately the right thing to do to
> clean up after catching a thrown exception.
>
> regards, tom lane
>


Re: benchmarking Flex practices

2019-06-21 Thread John Naylor
On Fri, Jun 21, 2019 at 12:02 AM Andres Freund  wrote:
> Might be worth also testing with a more repetitive testcase to measure
> both cache locality and branch prediction. I assume that with
> information_schema there's enough variability that these effects play a
> smaller role. And there's plenty real-world cases where there's a *lot*
> of very similar statements being parsed over and over. I'd probably just
> measure the statements pgbench generates or such.

I tried benchmarking with a query string with just

BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = 1;
SELECT abalance FROM pgbench_accounts WHERE aid = 1;
UPDATE pgbench_tellers SET tbalance = tbalance + 1 WHERE tid = 1;
UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 1;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (1,
1, 1, 1, CURRENT_TIMESTAMP);
END;

repeated about 500 times. With this, backtracking is about 3% slower:

HEAD:
1.15s

patch:
1.19s

patch + huge array:
1.19s

That's possibly significant enough to be evidence for your assumption,
as well as to persuade us to keep things as they are.

On Thu, Jun 20, 2019 at 10:52 PM Tom Lane  wrote:
> Huh.  That's really interesting, because removing backtracking was a
> demonstrable, significant win when we did it [1].  I wonder what has
> changed?  I'd be prepared to believe that today's machines are more
> sensitive to the amount of cache space eaten by the tables --- but that
> idea seems contradicted by your result that the table size isn't
> important.  (I'm wishing I'd documented the test case I used in 2005...)

It's possible the code used with backtracking is better predicted than
15 years ago, but my uneducated hunch is our Bison grammar has gotten
much worse in cache misses and branch prediction than the scanner has
in 15 years. That, plus the recent keyword lookup optimization might
have caused parsing to be completely dominated by Bison. If that's the
case, the 3% slowdown above could be a significant portion of scanning
in isolation.

> Hm.  Smaller binary is definitely nice, but 31763 is close enough to
> 32768 that I'd have little faith in the optimization surviving for long.
> Is there any way we could buy back some more transitions?

I tried quickly ripping out the unicode escape support entirely. It
builds with warnings, but the point is to just get the size -- that
produced an array with only 28428 elements, and that's keeping all the
no-backup rules intact. This might be unworkable and/or ugly, but I
wonder if it's possible to pull unicode escape handling into the
parsing stage, with "UESCAPE" being a keyword token that we have to
peek ahead to check for. I'll look for other rules that could be more
easily optimized, but I'm not terribly optimistic.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




using explicit_bzero

2019-06-21 Thread Peter Eisentraut
In a recent thread[0], the existence of explicit_bzero() was mentioned.
I went to look where we could use that to clear sensitive information
from memory and found a few candidates:

- In be-secure-common.c, clear the entered SSL passphrase in the error
path.  (In the non-error path, the buffer belongs to OpenSSL.)

- In libpq, clean up after reading .pgpass.  Otherwise, the entire file
including all passwords potentially remains in memory.

- In libpq, clear the password after a connection is closed
(freePGconn/part of PQfinish).

- pg_hba.conf could potentially contain passwords for LDAP, so that
should maybe also be cleared, but the structure of that code would make
that more involved, so I skipped that for now.  Efforts are probably
better directed at providing facilities to avoid having to do that.[1]

Any other ones?

A patch that implements the first three is attached.


[0]:
https://www.postgresql.org/message-id/043403c2-f04d-3a69-aa8a-9bb7b9ce8...@iki.fi
[1]:
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ44ssWhcKP1KYK2Dm9_XXk1_b629_qSDUhH1fWfuAvXg%40mail.gmail.com

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 5cd9ed8e0cc5fae18657acb033bbdb6fcba90f1b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 12 Jun 2019 08:58:22 +
Subject: [PATCH] Use explicit_bzero

---
 configure| 2 +-
 configure.in | 1 +
 src/backend/libpq/be-secure-common.c | 1 +
 src/include/pg_config.h.in   | 3 +++
 src/include/port.h   | 4 
 src/interfaces/libpq/fe-connect.c| 8 
 6 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index fd61bf6472..3fa8ec0603 100755
--- a/configure
+++ b/configure
@@ -15176,7 +15176,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred 
getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in cbrt clock_gettime copyfile explicit_bzero fdatasync getifaddrs 
getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat 
pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open 
strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 4586a1716c..7488f4519f 100644
--- a/configure.in
+++ b/configure.in
@@ -1615,6 +1615,7 @@ AC_CHECK_FUNCS(m4_normalize([
cbrt
clock_gettime
copyfile
+   explicit_bzero
fdatasync
getifaddrs
getpeerucred
diff --git a/src/backend/libpq/be-secure-common.c 
b/src/backend/libpq/be-secure-common.c
index 877226d377..4c1c6cb3c4 100644
--- a/src/backend/libpq/be-secure-common.c
+++ b/src/backend/libpq/be-secure-common.c
@@ -118,6 +118,7 @@ run_ssl_passphrase_command(const char *prompt, bool 
is_server_start, char *buf,
buf[--len] = '\0';
 
 error:
+   explicit_bzero(buf, size);
pfree(command.data);
return len;
 }
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 6cd4cfed0a..0062a4a426 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -201,6 +201,9 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_EDITLINE_READLINE_H
 
+/* Define to 1 if you have the `explicit_bzero' function. */
+#undef HAVE_EXPLICIT_BZERO
+
 /* Define to 1 if you have the `fdatasync' function. */
 #undef HAVE_FDATASYNC
 
diff --git a/src/include/port.h b/src/include/port.h
index b5c03d912b..af754b261c 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -381,6 +381,10 @@ extern int isinf(double x);
 #endif /* __clang__ && 
!__cplusplus */
 #endif /* !HAVE_ISINF */
 
+#ifndef HAVE_EXPLICIT_BZERO
+#define explicit_bzero(b, len) bzero(b, len)
+#endif
+
 #ifndef HAVE_STRTOF
 extern float strtof(const char *nptr, char **endptr);
 #endif
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index e58fa6742a..67ea169397 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3886,7 +3886,10 @@ freePGconn(PGconn *conn)
if (conn->connhost[i].port != NULL)
free(conn->connhost[i].port);
if (conn->connhost[i].password != NULL)
+   {
+   explicit_bzero(conn->connhost[i].password, 
strlen(conn->connhost[i].password));
   

Re: Multivariate MCV list vs. statistics target

2019-06-21 Thread Dean Rasheed
On Thu, 20 Jun 2019 at 23:12, Tomas Vondra  wrote:
>
> On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote:
> >On Tue, 18 Jun 2019 at 22:34, Tomas Vondra  
> >wrote:
> >>
> >> So I'm thinking we should allow tweaking the statistics for extended
> >> stats, and serialize it in the pg_statistic_ext catalog. Any opinions
> >> why that would be a bad idea?
> >
> >Seems reasonable to me. This might not be the only option we'll ever
> >want to add though, so perhaps a "stxoptions text[]" column along the
> >lines of a relation's reloptions would be the way to go.
>
> I don't know - I kinda dislike the idea of stashing stuff like this into
> text[] arrays unless there's a clear need for such flexibility (i.e.
> vision to have more such options). Which I'm not sure is the case here.
> And we kinda have a precedent in pg_attribute.attstattarget, so I'd use
> the same approach here.
>

Hmm, maybe. I can certainly understand your dislike of using text[].
I'm not sure that we can confidently say that multivariate statistics
won't ever need additional tuning knobs, but I have no idea at the
moment what they might be, and nothing else has come up so far in all
the time spent considering MCV lists and histograms, so maybe this is
OK.

Regards,
Dean