Re: [HACKERS] Variable substitution in psql backtick expansion

2017-11-12 Thread Tom Lane
Pavel Stehule <pavel.steh...@gmail.com> writes:
> [ psql-server-version-2.patch ]

I think this patch should be rejected.  It adds no new functionality;
you can get the string in question with "select version()".  Moreover,
you've been able to do that for lo these many years.  Any application
that tried to depend on this new way of getting the string would fail
when working with an older server or older psql.  That does not seem
like a good property for a version check.  Also, because the string
isn't especially machine-friendly, it's not very clear to me what the
use-case is for an application to use it at all, rather than the other
version formats we already provide.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix number skipping in to_number

2017-11-12 Thread Tom Lane
Oliver Ford <ojf...@gmail.com> writes:
> [ 0001-apply-number-v3.patch ]

I looked at this patch briefly and have a couple of comments:

* It seems entirely wrong to be matching to L_thousands_sep in the
NUM_COMMA case; that format code is by definition not locale aware,
so it should be matching to plain ',' independently of locale.

* Don't we need to fix the NUM_L (currency symbol) case in the
same manner?  (The NUM_D and NUM_S cases are handled in
NUM_numpart_from_char and seem ok at a quick glance.)

* I'm not in love with the noadd flag.  Other places in this
switch that want to skip the final increment do it with
"continue", and I think this should do likewise.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PSA: don't be in a hurry to update to XCode 9.0

2017-11-12 Thread Tom Lane
Dave Cramer <p...@fastcrypt.com> writes:
> Did you ever find a solution to this without updating ?

No.  I filed a bug report which Apple seems uninterested in, perhaps
not too surprisingly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-11 Thread Tom Lane
Stephen Frost <sfr...@snowman.net> writes:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> Forcing an admin to give full superuser rights to one user willing to
>> work only on LOs import and export is a wrong concept.

> The problem I have with this is that 'full superuser rights' actually
> are being granted by this.  You're able to read any file and write any
> file on the filesystem that the PG OS user can.  How is that not 'full
> superuser rights'?

It doesn't cause, say, "DELETE FROM pg_proc;" to succeed.

You're still not getting the point that this is for people who want
self-imposed restrictions.  Sure, you can't give out lo_export privilege
to someone you would not trust with superuser.  But somebody who needs
lo_export, and is trustworthy enough to have it, may still wish to do
the inside-the-database part of their work with less than full superuser,
just as a safety measure.  It's the *exact same* reason why cautious
people use "sudo" rather than just running as root all the time.

> As I mentioned up-thread, if we want to make it so that non-superusers
> can use lo_import/lo_export, then we should do that in a way that
> allows the administrator to specify exactly the rights they really want
> to give, based on a use-case for them.

Our current answer for that is wrapper functions.  This patch does not
make that answer any less applicable than before.

> I wonder what would happen if we allow this and then someone comes along
> and re-proposes the 'CREATE DIRECTORY' concept that I had once proposed
> (ideally with things cleaned up and tightened up to avoid there being
> issues using it) to address the actual use-case for these functions to
> be available to a non-superuser.  We wouldn't be able to change these
> functions to start checking the 'directory' rights or we would break
> existing non-superuser usage of them.

This is a straw man argument --- if we tighten up the function to check
this as-yet-nonexistent feature, how is that not breaking existing
use-cases anyway?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-11-10 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Fri, Nov 10, 2017 at 2:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Also, I wonder whether the InvalidOid hack in SS_assign_special_param
>> requires commentary.  It might be safer to use a valid type OID there,
>> perhaps VOIDOID or INTERNALOID.

> There is existing precedent for using InvalidOid to denote the absence
> of a parameter -- cf. copyParamList, SerializeParamList.

OK, fair enough.  I was concerned that this was adding a corner case
not otherwise seen with Params, but evidently not.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-11-10 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> I decided to try instead teaching the planner to keep track of the
> types of PARAM_EXEC parameters as they were created, and that seems to
> work fine.  See 0001, attached.

I did not look at the other part, but 0001 looks reasonable to me.
I might quibble with the grammar in the generate_new_param comment:

- * need to record the PARAM_EXEC slot number as being allocated.
+ * need to make sure we have record the type in paramExecTypes (otherwise,
+ * there won't be a slot allocated for it).
  */

I'd just go with "need to record the type in ..."

Also, I wonder whether the InvalidOid hack in SS_assign_special_param
requires commentary.  It might be safer to use a valid type OID there,
perhaps VOIDOID or INTERNALOID.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-10 Thread Tom Lane
I wrote:
> Is there anything we can do to cut the runtime of the TAP test to
> the point where running it by default wouldn't be so painful?

As an experiment, I tried simply cutting the size of the test table 10X:

diff --git a/contrib/bloom/t/001_wal.pl b/contrib/bloom/t/001_wal.pl
index 1b319c9..566abf9 100644
--- a/contrib/bloom/t/001_wal.pl
+++ b/contrib/bloom/t/001_wal.pl
@@ -57,7 +57,7 @@ $node_standby->start;
 $node_master->safe_psql("postgres", "CREATE EXTENSION bloom;");
 $node_master->safe_psql("postgres", "CREATE TABLE tst (i int4, t text);");
 $node_master->safe_psql("postgres",
-"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM 
generate_series(1,10) i;"
+"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM 
generate_series(1,1) i;"
 );
 $node_master->safe_psql("postgres",
"CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);");
@@ -72,7 +72,7 @@ for my $i (1 .. 10)
test_index_replay("delete $i");
$node_master->safe_psql("postgres", "VACUUM tst;");
test_index_replay("vacuum $i");
-   my ($start, $end) = (11 + ($i - 1) * 1, 10 + $i * 1);
+   my ($start, $end) = (10001 + ($i - 1) * 1000, 1 + $i * 1000);
$node_master->safe_psql("postgres",
 "INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM 
generate_series($start,$end) i;"
);

This about halved the runtime of the TAP test, and it changed the coverage
footprint not at all according to lcov.  (Said coverage is only marginally
better than what we get without running the bloom TAP test, AFAICT.)

It seems like some effort could be put into both shortening this test
and improving the amount of code it exercises.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-11-10 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Thu, Nov 9, 2017 at 7:51 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
>> OK, then so be it :)

> Thanks for the new version. This one, as well as the switch to
> psql_safe in 
> https://www.postgresql.org/message-id/CAPpHfduxgEYF_0BTs-mxGC4=w5sw8rnUbq9BSTp1Wq7=nwr...@mail.gmail.com
> are good for a committer lookup IMO.

The safe_psql change is a clear bug fix, so I've pushed it.

However, as far as adding the TAP test to the standard test suite
goes, I've got two complaints about this patch:

1. It doesn't (I think, can't test) do anything to run the test on
Windows.

2. The TAP test is enormously slower than the standard test.  On my
development workstation, "make installcheck" in contrib/bloom takes
about 0.5 sec; this patch increases that to 11.6 sec.  I'm not too
happy about that as a developer, and even less so as an owner of some
fairly slow buildfarm critters.  I'd say that this might be the
reason the TAP test wasn't part of the standard tests to begin with.

Is there anything we can do to cut the runtime of the TAP test to
the point where running it by default wouldn't be so painful?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some const decorations to prototypes

2017-11-10 Thread Tom Lane
Fabien COELHO <coe...@cri.ensmp.fr> writes:
>> LWLockTrancheArray = (char **)
>>  MemoryContextAllocZero(TopMemoryContext,
>> LWLockTranchesAllocated * sizeof(char *));

> After your explanation, and on third thoughts, ISTM that the assignment 
> should not include "const" in the explicit cast,

Can't get terribly excited about that one way or the other.  I think
the statement would be OK as-is, and it would also be fine as

LWLockTrancheArray = (const char **)
MemoryContextAllocZero(TopMemoryContext,
   LWLockTranchesAllocated * sizeof(const char *));

The other two possible combinations are not good of course --- not that
they'd generate invalid code, but that they'd require readers to expend
brain cells convincing themselves that the code wasn't wrong.

> ... and moreover the compiler does not 
> complain without the const.

Arguing on the basis of what your compiler does is a pretty shaky basis.
It's not impossible that someone else's compiler would complain if the
casted-to type isn't identical to the variable's type.  I tend to agree
that a compiler *should* allow "char **" to be cast to "const char **"
silently, but that isn't necessarily what happens in the real world.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some const decorations to prototypes

2017-11-10 Thread Tom Lane
Fabien COELHO <coe...@cri.ensmp.fr> writes:
>>> ISTM That there is still at least one strange cast:
>>>> +static const char **LWLockTrancheArray = NULL;
>>>> +   LWLockTrancheArray = (const char **) // twice

>> These are not cases of "cheating".  This is just the return value of a
>> memory allocation function being cast from void * to the appropriate
>> result type.  That is an orthogonal style decision that I have
>> maintained in these cases.

> Would it make sense that the function returns "const void *", i.e. the 
> cast is not on the const part but on the pointer type part?

Certainly not -- if malloc-like functions returned "const void *" then
you could never write on allocated space without having casted away
const.

In any case, what's shown above is not involving any funny stuff,
because the type of LWLockTrancheArray is pointer to non-const
array of pointers to const char strings.  So it's correct to be
assigning a non-const pointer to it.  If it were written like
"const char * const *" then the issues would be quite different.

As for your followup --- yes, we could just omit the cast altogether
and the compiler wouldn't complain, but that is not better style IMO.
The value of the cast really is to document what type we're expecting
the expression to produce.  In context, that statement (today) is

LWLockTrancheArray = (char **)
MemoryContextAllocZero(TopMemoryContext,
   LWLockTranchesAllocated * sizeof(char *));

and the reader can see by inspection that the calculation of how much
to allocate (so many char-* items) is consistent with the char-**
result.  It is not necessary to go find the declaration of
LWLockTrancheArray and verify that it's char **, because we trust the
compiler to whine if char ** isn't assignment-compatible with that.
But if we left off the cast and just assigned the function result directly
to the variable, then there would be nothing directly tying the variable's
type to this allocation computation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-10 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> That will not sound much as a surprise as I spawned the original
> thread, but like Robert I understand that getting rid of all superuser
> checks is a goal that we are trying to reach to allow admins to have
> more flexibility in handling permissions to a subset of objects.
> Forcing an admin to give full superuser rights to one user willing to
> work only on LOs import and export is a wrong concept.

Right.  I think the question may boil down to how we document this.
The current para reads

The server-side lo_import and
lo_export functions behave considerably differently
from their client-side analogs.  These two functions read and write files
in the server's file system, using the permissions of the database's
owning user.  Therefore, their use is restricted to superusers.  In
contrast, the client-side import and export functions read and write files
in the client's file system, using the permissions of the client program.
The client-side functions do not require superuser privilege.

I think as far as that goes, we can just change to "Therefore, by default
their use is restricted ...".  Then I suggest adding a  para
after that, with wording along the lines of

It is possible to GRANT use of server-side lo_import and lo_export to
non-superusers, but careful consideration of the security implications
is required.  A malicious user of such privileges could easily parlay
them into becoming superuser (for example by rewriting server
configuration files), or could attack the rest of the server's file
system without bothering to obtain database superuser privileges as
such.  Access to roles having such privilege must therefore be guarded
just as carefully as access to superuser roles.  Nonetheless, if use
of server-side lo_import or lo_export is needed for some routine task,
it's safer to use a role of this sort than full superuser privilege,
as that helps to reduce the risk of damage from accidental errors.

We could expand that by mentioning the possibility of wrapper functions,
but it seems long enough already.

Comments?

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Tom Lane
Stephen Frost <sfr...@snowman.net> writes:
> I'm guessing no, which essentially means that *we* consider access to
> lo_import/lo_export to be equivilant to superuser and therefore we're
> not going to implement anything to try and prevent the user who has
> access to those functions from becoming superuser.  If we aren't willing
> to do that, then how can we really say that there's some difference
> between access to these functions and being a superuser?

We seem to be talking past each other.  Yes, if a user has malicious
intentions, it's possibly to parlay lo_export into obtaining a superuser
login (I'm less sure that that's necessarily true for lo_import).
That does NOT make it "equivalent", except perhaps in the view of someone
who is only considering blocking malevolent actors.  It does not mean that
there's no value in preventing a task that needs to run lo_export from
being able to accidentally destroy any data in the database.  There's a
range of situations where you are concerned about accidents and errors,
not malicious intent; but your argument ignores those use-cases.

To put it more plainly: your argument is much like saying that a person
who knows a sudo password might as well do everything they ever do as
root.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Fri, Nov 10, 2017 at 7:05 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I did miss the need to fix the docs, and am happy to put in some strong
>> wording about the security hazards of these functions while fixing the
>> docs.  But I do not think that leaving them with hardwired superuser
>> checks is an improvement over being able to control them with GRANT.

> Sorry about that. lobj.sgml indeed mentions superusers. Do you need a patch?

No, I can write it.  But I'm going to wait to see where this debate
settles before expending effort on the docs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pageinspect option to forgo buffer locking?

2017-11-09 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Thu, Nov 9, 2017 at 12:58 PM, Andres Freund <and...@anarazel.de> wrote:
>> You can already pass arbitrary byteas to heap_page_items(), so I don't
>> see how this'd change anything exposure-wise? Or are you thinking that
>> users would continually do this with actual page contents and would be
>> more likely to hit edge cases than if using pg_read_binary_file() or
>> such (which obviously sees an out of date page)?

> More the latter.  It's not really an increase in terms of security
> exposure, but it might lead to more trouble in practice.

If we do this, I'd suggest exposing it as a separate SQL function
get_raw_page_unlocked() rather than as an option to get_raw_page().

The reasoning is that if we ever allow these functions to be controlled
via GRANT instead of hardwired superuser checks (cf nearby debate about
lo_import/lo_export), one might reasonably consider the unlocked form as
more risky than the locked form, and hence not wish to have to give out
privileges to both at once.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Thu, Nov 9, 2017 at 2:56 PM, Stephen Frost <sfr...@snowman.net> wrote:
>> Further, I agree entirely that we
>> shouldn't be deciding that certain capabilities are never allowed to be
>> given to a user- but that's why superuser *exists* and why it's possible
>> to give superuser access to multiple users.  The question here is if
>> it's actually sensible for us to make certain actions be grantable to
>> non-superusers which allow that role to become, or to essentially be, a
>> superuser.  What that does, just as it does in the table case, from my
>> viewpoint at least, is make it *look* to admins like they're grant'ing
>> something less than superuser when, really, they're actually giving the
>> user superuser-level access.  That violates the POLA because the admin
>> didn't write 'ALTER USER joe WITH SUPERUSER', they just wrote 'GRANT
>> EXECUTE ON lo_import() TO joe;'.

> This is exactly the kind of thing that I'm talking about.  Forcing an
> administrator to hand out full superuser access instead of being able
> to give just lo_import() makes life difficult for users for no good
> reason.  The existence of a theoretically-exploitable security
> vulnerability is not tantamount to really having access, especially on
> a system with a secured logging facility.

Exactly.  I think that Stephen's argument depends on what a black-hat
privilege recipient could theoretically do, but fails to consider what's
useful for white-hat users.  One of the standard rules for careful admins
is to do as little as possible as root/superuser.  If you have a situation
where it's necessary to use, say, lo_import as part of a routine data
import task, then the only alternative previously was to do that task as
superuser.  That is not an improvement, by any stretch of the imagination,
over granting lo_import privileges to some otherwise-vanilla role that can
run the data import task.

We've previously discussed workarounds such as creating SECURITY DEFINER
wrapper functions, but I don't think that approach does much to change the
terms of the debate: it'd still be better if the wrapper function itself
didn't need full superuser.

I did miss the need to fix the docs, and am happy to put in some strong
wording about the security hazards of these functions while fixing the
docs.  But I do not think that leaving them with hardwired superuser
checks is an improvement over being able to control them with GRANT.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Thu, Nov 9, 2017 at 6:05 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Another idea would be to invent a new external flag bit "INV_WRITE_ONLY",
>> so that people who wanted true write-only could get it, without breaking
>> backwards-compatible behavior.  But I'm inclined to wait for some field
>> demand to show up before adding even that little bit of complication.

> Demand that may never show up, and the current behavior on HEAD does
> not receive any complains either. I am keeping the patch simple for
> now. That's less aspirin needed for everybody.

Looks good to me, pushed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (spelling) Ensure header of postgresql.auto.conf is consistent

2017-11-09 Thread Tom Lane
=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?= <fabriziome...@gmail.com> writes:
> Em qui, 9 de nov de 2017 às 06:15, Feike Steenbergen <
> feikesteenber...@gmail.com> escreveu:
>> Attached a patch that ensures the header of postgresql.auto.conf is
>> consistent, whether created by initdb or recreated when ALTER SYSTEM
>> is issued.

> Interesting... IMHO this typo should be backpatched to 9.4 when ALTER
> SYSTEM was introduced.

Agreed, and done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A weird bit in pg_upgrade/exec.c

2017-11-09 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> I think odd coding this was introduced recently because of the
> pg_resetxlog -> pg_resetwal renaming.

Dunno about that, but certainly somebody fat-fingered a refactoring
there.  The 9.6 code looks quite different but doesn't seem to be
doing anything silly.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A weird bit in pg_upgrade/exec.c

2017-11-09 Thread Tom Lane
a.akent...@postgrespro.ru writes:
> I've came across a weird bit in pg_upgrade/exec.c

> We have a function check_bin_dir() which goes like this (old_cluster and 
> new_cluster are global variables):
> void check_bin_dir(ClusterInfo *cluster)
> {
>  ...
>  get_bin_version(_cluster);
>  get_bin_version(_cluster);
>  ...
> }

> This function has two calls:
> check_bin_dir(_cluster);
> check_bin_dir(_cluster);

> I'd like to substitute these last two lines with this:
> get_bin_version(cluster);

Yeah, the way it is now seems outright broken.  It will try to do
get_bin_version on the new cluster before having done validate_exec
there, violating its own comment.

I think we should change this as a bug fix, independently of whatever
else you had in mind to do here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-09 Thread Tom Lane
Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> On Thu, Nov 9, 2017 at 6:27 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Is there really much interest in Bonjour support on non-macOS platforms?
>> I hadn't heard that anybody but Apple was invested in it.

> Not from me.  My only interest here was to pipe up because I knew that
> what was originally proposed would have broken stuff on macOS, and
> after that piqued curiosity.  I won't mind at all if you revert the
> commit to prevent confusion.  If the intersection of FreeBSD,
> PostgreSQL and Bonjour users is a non-empty set, [s]he might at least
> find this archived discussion useful...

Not hearing anyone else speaking up for this, I'll go revert the
configure change, and instead put in a comment pointing out that
Avahi support would require a lot more than an extra -l switch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-09 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> No, that's not right.  Now that you mention it, I realize that tuple
> locks can definitely cause deadlocks.  Example:

Yeah.  Foreign-key-related tuple locks are another rich source of
examples.

> ... So I don't
> think we can remove speculative insertion locks from the deadlock
> detector either.

That scares me too.  I think that relation extension can safely
be transferred to some lower-level mechanism, because what has to
be done while holding the lock is circumscribed and below the level
of database operations (which might need other locks).  These other
ideas seem a lot riskier.

(But see recent conversation where I discouraged Alvaro from holding
extension locks across BRIN summarization activity.  We'll need to look
and make sure that nobody else has had creative ideas like that.)

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> On Thu, Nov 9, 2017 at 5:03 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Is the AC_SEARCH_LIBS configure call needed to make PG build with the
>> FreeBSD package?

> Yes.  My take is that the commit was correct: the library is needed
> for --with-bonjour to work on non-macOS systems, and apparently it can
> work (though I didn't personally try to assess that beyond seeing that
> it could start up and connect to mdnsd).  Perhaps Avahi doesn't
> qualify as a suitable Bonjour implementation any more though, and
> someone out there might like to consider writing a --with-avahi option
> that uses the native API it's shouting about.

I'm not sure what to do at this point.  I concur that the AC_SEARCH_LIBS
call is helpful if you're using mDNSResponder on FreeBSD (or wherever
else that may be available) ... but I'm worried that it will enable
people to create broken builds on Linux without trying very hard.
We might be wise to deem that putting that call in is just creating
an attractive nuisance.

This would certainly be easier if we had a certifiably-working interface
to the avahi library.  But we don't, and I don't plan to write one,
and I doubt anyone else will come out of the woodwork to do it either.

Is there really much interest in Bonjour support on non-macOS platforms?
I hadn't heard that anybody but Apple was invested in it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> On Thu, Nov 9, 2017 at 1:39 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Hm, the library on F25 is also avahi's.  Digging in the archives, I find
>> this old thread reporting the same behavior:
>> https://www.postgresql.org/message-id/flat/17824.1252293423%40sss.pgh.pa.us

> FWIW it builds and starts up fine on FreeBSD with
> mDNSResponder-878.1.1 installed (Apache-licensed Apple Bonjour code)
> and the mdnsd daemon started.  If I don't start mdnsd it shows an
> error at startup.  When built against the (conflicting)
> avahi-libdns-0.6.31_2 package it shows the WARNING you reported and
> "DNSServiceRegister() failed: error code -65537", which might just
> mean it wants to talk to some daemon I'm not running.

Interesting.  Fedora doesn't seem to package mDNSResponder as such ---
"dnf search mDNSResponder" just returns more pointers to avahi:

avahi-compat-libdns_sd.x86_64 : Libraries for Apple Bonjour mDNSResponder
  : compatibility
avahi-compat-libdns_sd.i686 : Libraries for Apple Bonjour mDNSResponder
: compatibility
avahi-compat-libdns_sd-devel.x86_64 : Header files for the Apple Bonjour
: mDNSResponder compatibility libraries
avahi-compat-libdns_sd-devel.i686 : Header files for the Apple Bonjour
  : mDNSResponder compatibility libraries

Is the AC_SEARCH_LIBS configure call needed to make PG build with the
FreeBSD package?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Luke Lonergan <l...@brightforge.com> writes:
> On 11/8/17, 3:00 PM, "Tom Lane" <t...@sss.pgh.pa.us> wrote:
>> BTW, when I try this on Fedora 25, it builds cleanly but the feature
>> doesn't seem to work --- I get this at postmaster start:
>> ...
>> I wonder which libdns_sd you are using.

> libavahi-compat-libdnssd1:amd64: /usr/lib/x86_64-linux-gnu/libdns_sd.so.1.0.0

Hm, the library on F25 is also avahi's.  Digging in the archives, I find
this old thread reporting the same behavior:

https://www.postgresql.org/message-id/flat/17824.1252293423%40sss.pgh.pa.us

So now I'm wondering if you know something the rest of us don't about
how to configure the platform for bonjour to work.

I'm also a bit disturbed about the report that libdns_sd was causing
the postmaster to become multithreaded.  If still true, that's quite
bad, and might be a reason to decide we don't want this change after
all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Luke Lonergan <l...@brightforge.com> writes:
> Hi Tom – works for me on Linux (Ubuntu)…

BTW, when I try this on Fedora 25, it builds cleanly but the feature
doesn't seem to work --- I get this at postmaster start:

*** WARNING *** The program 'postgres' uses the Apple Bonjour compatibility 
layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see 
<http://0pointer.de/avahi-compat?s=libdns_sd=postgres>
2017-11-08 17:58:42.451 EST [23762] LOG:  DNSServiceRegister() failed: error 
code -65540

I wonder which libdns_sd you are using.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Luke Lonergan <l...@brightforge.com> writes:
> Hi Tom – works for me on Linux (Ubuntu)…
> +   AC_SEARCH_LIBS([DNSServiceRefSockFD],[dns_sd])

Pushed with an error message added.  I also took the trouble to
standardize the syntax of our various AC_SEARCH_LIBS calls ---
they weren't very consistent about quoting.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] fix wrong create table statement in documentation

2017-11-08 Thread Tom Lane
jotpe <jo...@posteo.de> writes:
> In the current documentation [1] this create table statement is listed:
> CREATE TABLE measurement_y2008m01 PARTITION OF measurement
>  FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
>  TABLESPACE fasttablespace
>  WITH (parallel_workers = 4);

Yup, that's wrong.  Fix pushed, thanks!

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Tom Lane
Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> On Thu, Nov 9, 2017 at 10:05 AM, Luke Lonergan <l...@brightforge.com> wrote:
>> +   AC_CHECK_LIB(dns_sd, DNSServiceRefSockFD, [], [AC_MSG_ERROR([library
>> 'dns_sd' is required for Bonjour])])

> It lives in libSystem.dylib (implicitly linked) on macOS, so that
> would break the build there.  We'd need something a bit more
> conditional, but I don't know what.

A quick look at the Autoconf manual finds:

 `AC_CHECK_LIB' requires some care in usage, and should be avoided
 in some common cases.  Many standard functions like `gethostbyname'
 appear in the standard C library on some hosts, and in special
 libraries like `nsl' on other hosts.  On some hosts the special
 libraries contain variant implementations that you may not want to
 use.  These days it is normally better to use
 `AC_SEARCH_LIBS([gethostbyname], [nsl])' instead of
 `AC_CHECK_LIB([nsl], [gethostbyname])'.

If Luke wants to check that that works for him, I can check it on
macOS.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenTemporaryFile() vs resowner.c

2017-11-08 Thread Tom Lane
Thomas Munro <thomas.mu...@enterprisedb.com> writes:
> Andres, Robert and Peter G rightly complained[1] that my shared
> temporary file patch opens a file, then calls
> ResourceOwnerEnlargeFiles() which can fail due to lack of memory, and
> then registers the file handle to make sure we don't leak it.  Doh.
> The whole point of the separate ResourceOwnerEnlargeXXX() interface is
> to be able to put it before resource acquisition.

> The existing OpenTemporaryFile() coding has the same mistake.  Please
> see attached.

Pushed.  I grepped for related problems and found that IncrBufferRefCount
was also living dangerously, though in a different way: it remembered
a refcount it hadn't actually applied yet.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-08 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Tue, Sep 26, 2017 at 11:42 AM, Vaishnavi Prabakaran
> <vaishnaviprabaka...@gmail.com> wrote:
>> I moved the cf entry to "ready for committer", and though my vote is for
>> keeping the existing API behavior with write implying read, I let the
>> committer decide whether the following behavior change is Ok or not.
>> "Reading from a large-object descriptor opened with INV_WRITE is NOT
>> possible"

> Thanks for the review!

After chewing on this some more, I'm inclined to agree that we should
not change the behavior of the read/write flags.  There's been no
field requests for a true-write-only mode, so I think we're much more
likely to get complaints from users whose code we broke than plaudits
from people who think the change is helpful.

I believe it would be easy enough to adjust the patch so that we can
still have the refactoring benefits; we'd just need the bit that
translates from external to internal flags to go more like

if (flags & INV_WRITE)
descflags |= IFS_WRLOCK | IFS_RDLOCK;
if (flags & INV_READ)
descflags |= IFS_RDLOCK;

(Preferably with a comment about why it's like this.)

Another idea would be to invent a new external flag bit "INV_WRITE_ONLY",
so that people who wanted true write-only could get it, without breaking
backwards-compatible behavior.  But I'm inclined to wait for some field
demand to show up before adding even that little bit of complication.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-08 Thread Tom Lane
Claudio Freire <klaussfre...@gmail.com> writes:
> What's perhaps not clear is whether there are better ideas. Like
> rebuilding the page as Tom proposes, which doesn't seem like a bad
> idea. Bucket sort already is O(bytes), just as memcopy, only it has a
> lower constant factor (it's bytes/256 in the original patch), which
> might make copying the whole page an extra time lose against bucket
> sort in a few cases.

> Deciding that last point does need more benchmarking. That doesn't
> mean the other improvements can't be pursued in the meanwhile, right?

Well, I doubt we're going to end up committing more than one of these
ideas.  The question is which way is best.  If people are willing to
put in the work to test all of them, let's do it.

BTW, it strikes me that in considering the rebuild-the-page approach,
we should not have blinders on and just measure the speed of
PageRepairFragmentation.  Rather, we should take a look at what happens
subsequently given a physically-ordered set of tuples.  I can recall
Andres or someone moaning awhile ago about lack of locality of access in
index page searches --- maybe applying that approach while vacuuming
indexes will help?

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-08 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> Just to throw a random idea out here, we currently have
> gen_qsort_tuple.pl producing qsort_tuple() and qsort_ssup().  Maybe it
> could be modified to also produce a specialized qsort_itemids().  That
> might be noticeably faster that our general-purpose qsort() for the
> reasons mentioned in the comments in gen_qsort_tuple.pl, viz:

+1 for somebody trying that (I'm not volunteering, though).

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-11-08 Thread Tom Lane
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> On 10/7/17 16:46, Tom Lane wrote:
>> Accordingly I propose the attached patch.  If anyone's interested in
>> experimenting on other platforms, we might be able to refine/complicate
>> the FLUSH_DISTANCE selection further, but I think this is probably good
>> enough for a first cut.

> What is the status of this?  Is performance on High Sierra still bad?

I committed the fix at 643c27e36.  If Apple have done anything about the
underlying problem, you couldn't tell it from their non-response to my
bug report.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-08 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Tue, Nov 7, 2017 at 4:39 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> What I'm getting from the standard pgbench measurements, on both machines,
>> is that this patch might be a couple percent slower than HEAD, but that is
>> barely above the noise floor so I'm not too sure about it.

> Hmm.  It seems like slowing down single client performance by a couple
> of percent is something that we really don't want to do.

I do not think there is any change here that can be proven to always be a
win.  Certainly the original patch, which proposes to replace an O(n log n)
sort algorithm with an O(n^2) one, should not be thought to be that.
The question to focus on is what's the average case, and I'm not sure how
to decide what the average case is.  But more than two test scenarios
would be a good start.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL procedures

2017-11-08 Thread Tom Lane
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> On 10/31/17 14:23, Tom Lane wrote:
>> Why not use VOIDOID for the prorettype value?

> We need a way to distinguish functions that are callable by SELECT and
> procedures that are callable by CALL.

Do procedures of this ilk belong in pg_proc at all?  It seems like a large
fraction of the attributes tracked in pg_proc are senseless for this
purpose.  A new catalog might be a better approach.

In any case, I buy none of your arguments that 0 is a better choice than a
new pseudotype.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Tom Lane
Peter Geoghegan <p...@bowt.ie> writes:
> My point is only that it's worth considering that this factor affects
> how representative your sympathetic case is. It's not clear how many
> PageIndexMultiDelete() calls are from opportunistic calls to
> _bt_vacuum_one_page(), how important that subset of calls is, and so
> on. Maybe it doesn't matter at all.

According to the perf measurements I took earlier, essentially all the
compactify_tuple calls in this test case are from PageRepairFragmentation
(from heap_page_prune), not PageIndexMultiDelete.

I'd be the first to agree that I doubt that test case is really
representative.  I'd been whacking around Yura's original case to
try to get PageRepairFragmentation's runtime up to some measurable
fraction of the total, and while I eventually succeeded, I'm not
sure that too many real workloads will look like that.  However,
if we can make it smaller as well as faster, that seems like a win
even if it's not a measurable fraction of most workloads.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> I think it would be a good idea, as Thomas says, to order the qual
> clauses at an earlier stage and then remember our decision.  However,
> we have to think about whether that's going to increase planning time
> in a noticeable way.  I wonder why we currently postpone this until
> such a late phase of planning.

Because (1) up to now there's been no need to consider the qual ordering
till later, and (2) re-doing that sort for each path seemed unduly
expensive.  If we're to try to estimate whether later quals will be
reached, then sure the ordering becomes important.  I'm still concerned
about (2) though.  If there were a way to pre-sort the quals once for
all paths, the problem goes away, but I don't think that works ---
indexscans may segregate some quals, and in join cases different paths
will actually have different sets of quals they need to check depending
on the join order.

So the bottom line here is how much is it going to cost us to add this
additional refinement in cost estimation, and is it worth it given our
extremely poor level of accuracy in expression cost estimation.  Color
me dubious.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-07 Thread Tom Lane
I've been getting less and less excited about this patch, because I still
couldn't measure any above-the-noise performance improvement without
artificial exaggerations, and some cases seemed actually slower.

However, this morning I had an epiphany: why are we sorting at all?

There is no requirement that these functions preserve the physical
ordering of the tuples' data areas, only that the line-pointer ordering be
preserved.  Indeed, reorganizing the data areas into an ordering matching
the line pointers is probably a good thing, because it should improve
locality of access in future scans of the page.  This is trivial to
implement if we copy the data into a workspace area and back again, as
I was already proposing to do to avoid memmove.  Moreover, at that point
there's little value in a separate compactify function at all: we can
integrate the data-copying logic into the line pointer scan loops in
PageRepairFragmentation and PageIndexMultiDelete, and get rid of the
costs of constructing the intermediate itemIdSortData arrays.

That led me to the attached patch, which is the first version of any
of this work that produces an above-the-noise performance win for me.
I'm seeing 10-20% gains on this modified version of Yura's original
example:

psql -f test3setup.sql
pgbench -M prepared -c 3 -s 1000 -T 300 -P 3 -n -f test3.sql

(sql scripts also attached below; I'm using 1GB shared_buffers and
fsync off, other parameters stock.)

However, there are a couple of objections that could be raised to
this patch:

1. It's trading off per-byte work, in the form of an extra memcpy,
to save sorting work that has per-tuple costs.  Therefore, the relatively
narrow tuples used in Yura's example offer a best-case scenario;
with wider tuples the performance might be worse.

2. On a platform with memmove not so much worse than memcpy as I'm
seeing on my RHEL6 server, trading memmove for memcpy might not be
such a win.

To address point 1, I tried some measurements on the standard pgbench
scenario, which uses significantly wider tuples.  In hopes of addressing
point 2, I also ran the measurements on a laptop running Fedora 25
(gcc 6.4.1, glibc 2.24); I haven't actually checked memmove vs memcpy
on that machine, but at least it's a reasonably late-model glibc.

What I'm getting from the standard pgbench measurements, on both machines,
is that this patch might be a couple percent slower than HEAD, but that is
barely above the noise floor so I'm not too sure about it.

So I think we should seriously consider the attached, but it'd be a
good idea to benchmark it on a wider variety of platforms and test
cases.

regards, tom lane

drop table if exists test3;

create unlogged table test3 (
 id integer PRIMARY KEY with (fillfactor=85),
 val text
 ) WITH (fillfactor=85);

insert into test3 select i, '!'||i from generate_series(1, 1000) as i;

vacuum analyze; checkpoint;

create or replace function dotest3(n int, scale float8) returns void
language plpgsql as $$
begin
for i in 1..n loop
  declare
id1 int := random() * scale;
id2 int := random() * scale;
  begin
perform * from test3 where id = id1;
update test3 set val = '!'|| id2 where id = id1;
  end;
end loop;
end $$;
select dotest3(100, :scale);
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index b6aa2af..73b73de 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
*** PageRestoreTempPage(Page tempPage, Page 
*** 415,471 
  }
  
  /*
-  * sorting support for PageRepairFragmentation and PageIndexMultiDelete
-  */
- typedef struct itemIdSortData
- {
- 	uint16		offsetindex;	/* linp array index */
- 	int16		itemoff;		/* page offset of item data */
- 	uint16		alignedlen;		/* MAXALIGN(item data len) */
- } itemIdSortData;
- typedef itemIdSortData *itemIdSort;
- 
- static int
- itemoffcompare(const void *itemidp1, const void *itemidp2)
- {
- 	/* Sort in decreasing itemoff order */
- 	return ((itemIdSort) itemidp2)->itemoff -
- 		((itemIdSort) itemidp1)->itemoff;
- }
- 
- /*
-  * After removing or marking some line pointers unused, move the tuples to
-  * remove the gaps caused by the removed items.
-  */
- static void
- compactify_tuples(itemIdSort itemidbase, int nitems, Page page)
- {
- 	PageHeader	phdr = (PageHeader) page;
- 	Offset		upper;
- 	int			i;
- 
- 	/* sort itemIdSortData array into decreasing itemoff order */
- 	qsort((char *) itemidbase, nitems, sizeof(itemIdSortData),
- 		  itemoffcompare);
- 
- 	upper = phdr->pd_special;
- 	for (i = 0; i < nitems; i++)
- 	{
- 		itemIdSort	itemidptr = [i];
- 		ItemId		lp;
- 
- 		lp = PageGetItemId(page, itemidptr->offsetindex + 1);
- 		upper -= itemidptr->alignedlen;
- 		memmove((char *) page + upper,
- (char *) page + itemidptr->itemoff,
- itemidptr->alignedlen);
- 		lp->lp_off = upper;
- 	}
- 
- 	phdr->pd_upper = upper;
- }
- 
- /

Re: [HACKERS] proposal - pg_dump: flag to suppress output of SQL comments

2017-11-06 Thread Tom Lane
Malcolm Locke <m...@wholemeal.co.nz> writes:
> Would a patch to add a flag to pg_dump to suppress the output of SQL
> comments be likely to be accepted?

Not unless you can come up with a better rationale than this:

> The SQL generated by pg_dump seems to be fairly constant between
> Postgres versions, however the structure of the SQL comments in the
> dumps changes quite frequently between Postgres versions.  This creates
> a lot of churn on these structure files, unrelated to actual changes in
> the database structure, in our VCS when developers are using different
> versions of Postgres.

That seems like complete nonsense; we don't change the comments more
frequently than other aspects of pg_dump's output, in fact probably
much less often.

Just to make sure I'm not totally mistaken about this, I diffed the
results from pg_dump 9.2 through HEAD dumping the same 9.2 database.
I do see a couple of rounds of comment changes, but there are also two
different rounds of changes in dump order, a round of changes in layout
of view/rule reverse-compilations, a round of changes in the
schema-qualification of ALTER TYPE OWNER commands, multiple changes in
the dump header (particularly the collection of SET commands there), and
assorted changes in minor details of syntax.  And the particular test
database I was using (the 9.2 regression database) doesn't even trigger
some other changes that have been made, such as how to break circular
dependencies involving views.  We're not going to introduce
backwards-compatibility options for that sort of stuff (I hope), so
I do not think that a switch of this sort is really going to produce
the end result you're wishing for.

You might be able to standardize things a bit better if you could get
all your developers to use the same late-model version of pg_dump
while producing output to go into the VCS.  That won't be a 100%
solution, because some of the version-specific output is generated
on the backend side, but I bet it would improve matters a lot.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane
Magnus Hagander <mag...@hagander.net> writes:
> On Mon, Nov 6, 2017 at 4:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Hm, around here it's no match -> spam bucket.  But in any case, why

> I think you're quite uncommon in that setup.

Interesting, because "it's not addressed to me (or any list I'm on)"
is the best single spam filtering rule I know, and has been for a
decade or two.

But we veer far off topic here.  Do as you will.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane
Magnus Hagander <mag...@hagander.net> writes:
> On Mon, Nov 6, 2017 at 4:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I suggest doing that the other way 'round.  Otherwise, the email
>> about the change will inevitably go into a lot of peoples' bit
>> buckets if they haven't adjusted their mail filters yet.

> The argument for doing it after the migration is that the complaints that
> we have received so far have all been from people where email ends up in
> the *inbox* after the migration, not the bitbucket. That's the default
> action in most peoples MUAs when their rules no longer match...

Hm, around here it's no match -> spam bucket.  But in any case, why
would you not want to send it before so that it would end up where
they're accustomed to seeing the list's traffic?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Schedule for migration to pglister

2017-11-06 Thread Tom Lane
Stephen Frost <sfr...@snowman.net> writes:
> Each list will receive an email with a link to the wiki about the
> migration after the list has been migrated.

I suggest doing that the other way 'round.  Otherwise, the email
about the change will inevitably go into a lot of peoples' bit
buckets if they haven't adjusted their mail filters yet.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Early locking option to parallel backup

2017-11-05 Thread Tom Lane
Andres Freund <and...@anarazel.de> writes:
> Well, the current approach afaics requires #relations * 2 locks, whereas
> acquiring them in every worker would scale that with the number of
> workers.

Yeah, that's gonna be a problem with this proposal.

> IIUC the problem here is that even though a lock is already
> held by the main backend an independent locker's request will prevent
> the on-demand lock by the dump worker from being granted.  It seems to
> me the correct fix here would be to somehow avoid the fairness logic in
> the parallel dump case - although I don't quite know how to best do so.

I wonder if we couldn't somehow repurpose the work that was done for
parallel workers' locks.  Lots of security-type issues to be handled
if we're to open that up to clients, but maybe it's solvable.  For
instance, maybe only allowing it to clients sharing the same snapshot
would help.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-05 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> This looks like it's on the right track to me.  I hope Tom will look
> into it, but if he doesn't I may try to get it committed myself.

I do plan to take a look at it during this CF.

> +/* Set or update cheapest_total_path and related fields */
> +set_cheapest(current_rel);

> I wonder if it's really OK to call set_cheapest() a second time for
> the same relation...

It's safe enough, we do it in some places already when converting
a relation to dummy.  But having to do that in a normal code path
suggests that something's not right about the design ...

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Display number of heap accesses for index scans

2017-11-05 Thread Tom Lane
Peter Geoghegan <p...@bowt.ie> writes:
> Andres Freund <and...@anarazel.de> wrote:
>> The number of index lookups that failed to return anything can be a
>> critical performance factor in OLTP workloads.  Therefore it seems like
>> it'd be a good idea to extend the explain analyze output to include that
>> information.

> I certainly agree.

Doesn't the EXPLAIN (BUFFERS) output already address this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
> break your database irretrievably.  If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.  The
> only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

> Both of those alternatives sound fairly unpleasant to me, but I'm not
> exactly sure what to recommend in terms of how to make it better.
> Ideally anything we expose as an SQL command should have a DROP
> command that undoes whatever CREATE did and leaves the database in an
> intact state, but that seems hard to achieve in this case.

If the use of a compression method is tied to specific data types and/or
columns, then each of those could have a dependency on the compression
method, forcing a type or column drop if you did DROP COMPRESSION METHOD.
That would leave no reachable data using the removed compression method.
So that part doesn't seem unworkable on its face.

IIRC, the bigger concerns in the last discussion had to do with
replication, ie, can downstream servers make sense of the data.
Maybe that's not any worse than the issues you get with non-core
index AMs, but I'm not sure.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Release notes for next week's minor releases are up for review

2017-11-04 Thread Tom Lane
... at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=42de8a0255c2509bf179205e94b9d65f9d6f3cf9

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Account for catalog snapshot in PGXACT->xmin updates.

2017-11-04 Thread Tom Lane
Noah Misch <n...@leadboat.com> writes:
> I plan to use the attached patch after the minor release tags land.  If
> there's significant support, I could instead push before the wrap.

This looks fine to me --- I think you should push now.

(which reminds me, I'd better get on with making release notes.)

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Plans and Cost of non-filter functions

2017-11-03 Thread Tom Lane
Paul Ramsey <pram...@cleverelephant.ca> writes:
>> Whether I get a parallel aggregate seems entirely determined by the number
>> of rows, not the cost of preparing those rows.

> This is true, as far as I can tell and unfortunate. Feeding tables with
> 100ks of rows, I get parallel plans, feeding 10ks of rows, never do, no
> matter how costly the work going on within. That's true of changing costs
> on the subquery select list, and on the aggregate transfn.

This sounds like it might be the same issue being discussed in

https://www.postgresql.org/message-id/flat/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yhwu4c4us5jgvgx...@mail.gmail.com

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-03 Thread Tom Lane
I wrote:
> Have not looked at the 0002 patch yet.

I looked at that one, and it seems to be a potential win with no
downside, so pushed.  (I tweaked it slightly to avoid an unnecessary
conflict with the test patch I posted earlier.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Jeff Janes <jeff.ja...@gmail.com> writes:
> With this v3 patch (assuming this is the one you just committed
> as ec42a1dcb30de235b252f6d4) am now getting make check failures.

There's a followup commit already :-(

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-11-03 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Fri, Nov 3, 2017 at 1:10 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Fri, Nov 3, 2017 at 2:54 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> I've marked the CF entry closed.  However, I'm not sure if we're quite
>>> done with this thread.  Weren't we going to adjust nbtree and hash to
>>> be more aggressive about labeling their metapages as REGBUF_STANDARD?

>> I have already posted the patches [1] for the same in this thread and
>> those are reviewed [2][3] as well. I have adjusted the comments as per
>> latest commit.   Please find updated patches attached.

> Confirmed. Setting those makes sense even if REGBUF_WILL_INIT is set,
> at least for page masking.

Thanks, I'd forgotten those patches were already posted.  Looks good,
so pushed.

Looking around, I noted that contrib/bloom also had the disease of
not telling log_newpage it was writing a standard-format metapage,
so I fixed that too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ucs_wcwidth vintage

2017-11-03 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Thomas Munro wrote:
>> src/backend/utils/mb/wchar.c contains a ~16 year old wcwidth
>> implementation that originally arrived in commit df4cba68, but the
>> upstream code[1] apparently continued evolving and there have been
>> more Unicode revisions since.

> I think we should update it to current upstream source, then, just like
> we (are supposed to) do for any other piece of code we adopt.

+1 ... also, is that upstream still the best reference?

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-03 Thread Tom Lane
Claudio Freire <klaussfre...@gmail.com> writes:
> On Thu, Nov 2, 2017 at 11:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> BTW, the originally given test case shows no measurable improvement
>> on my box.

> I did manage to reproduce the original test and got a consistent improvement.

It occurred to me that I could force the issue by hacking bufpage.c to
execute compactify_tuples multiple times each time it was called, as in
the first patch attached below.  This has nothing directly to do with real
performance of course, but it's making use of the PG system to provide
realistic test data for microbenchmarking compactify_tuples.  I was a bit
surprised to find that I had to set the repeat count to 1000 to make
compactify_tuples really dominate the runtime (while using the originally
posted test case ... maybe there's a better one?).  But once I did get it
to dominate the runtime, perf gave me this for the CPU hotspots:

+   27.97%27.88%229040  postmaster   libc-2.12.so   
  [.] memmove
+   14.61%14.57%119704  postmaster   postgres   
  [.] compactify_tuples
+   12.40%12.37%101566  postmaster   libc-2.12.so   
  [.] _wordcopy_bwd_aligned
+   11.68%11.65% 95685  postmaster   libc-2.12.so   
  [.] _wordcopy_fwd_aligned
+7.67% 7.64% 62801  postmaster   postgres   
  [.] itemoffcompare
+7.00% 6.98% 57303  postmaster   postgres   
  [.] compactify_tuples_loop
+4.53% 4.52% 37111  postmaster   postgres   
  [.] pg_qsort
+1.71% 1.70% 13992  postmaster   libc-2.12.so   
  [.] memcpy

which says that micro-optimizing the sort step is a complete, utter waste
of time, and what we need to be worried about is the data copying part.

The memcpy part of the above is presumably from the scaffolding memcpy's
in compactify_tuples_loop, which is interesting because that's moving as
much data as the memmove's are.  So at least with RHEL6's version of
glibc, memmove is apparently a lot slower than memcpy.

This gave me the idea to memcpy the page into some workspace and then use
memcpy, not memmove, to put the tuples back into the caller's copy of the
page.  That gave me about a 50% improvement in observed TPS, and a perf
profile like this:

+   38.50%38.40%299520  postmaster   postgres   
[.] compactify_tuples
+   31.11%31.02%241975  postmaster   libc-2.12.so   
[.] memcpy
+8.74% 8.72% 68022  postmaster   postgres   
[.] itemoffcompare
+6.51% 6.49% 50625  postmaster   postgres   
[.] compactify_tuples_loop
+4.21% 4.19% 32719  postmaster   postgres   
[.] pg_qsort
+1.70% 1.69% 13213  postmaster   postgres   
[.] memcpy@plt

There still doesn't seem to be any point in replacing the qsort,
but it does seem like something like the second attached patch
might be worth doing.

So I'm now wondering why my results seem to be so much different
from those of other people who have tried this, both as to whether
compactify_tuples is worth working on at all and as to what needs
to be done to it if so.  Thoughts?

        regards, tom lane

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 41642eb..bf6d308 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
*** compactify_tuples(itemIdSort itemidbase,
*** 465,470 
--- 465,489 
  	phdr->pd_upper = upper;
  }
  
+ static void
+ compactify_tuples_loop(itemIdSort itemidbase, int nitems, Page page)
+ {
+ 	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+ 	union {
+ 		char page[BLCKSZ];
+ 		double align;
+ 	} pagecopy;
+ 	int i;
+ 
+ 	for (i = 1; i < 1000; i++)
+ 	{
+ 		memcpy(copy, itemidbase, sizeof(itemIdSortData) * nitems);
+ 		memcpy(pagecopy.page, page, BLCKSZ);
+ 		compactify_tuples(copy, nitems, pagecopy.page);
+ 	}
+ 	compactify_tuples(itemidbase, nitems, page);
+ }
+ 
  /*
   * PageRepairFragmentation
   *
*** PageRepairFragmentation(Page page)
*** 560,566 
  	 errmsg("corrupted item lengths: total %u, available space %u",
  			(unsigned int) totallen, pd_special - pd_lower)));
  
! 		compactify_tuples(itemidbase, nstorage, page);
  	}
  
  	/* Set hint bit for PageAddItem */
--- 579,585 
  	 errmsg("corrupted item lengths: total %u, available space %u",
  			(unsigned int) totallen, pd_special - pd_lower)));
  
! 		compactify_tuples_loop(itemidbase, nstorage, page);
  	}
  
  	/* Set hint bit for PageAddItem */
*** PageIndexMultiDelete(Page page, OffsetNu
*** 940,946 
 

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-11-03 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> It might work to build the new key in a context that's initially a
>> child of CurrentMemoryContext, then reparent it to be a child of
>> CacheMemoryContext when done. 

> That's another way (than the PG_TRY block), but I think it's more
> complicated with no gain.

I disagree: PG_TRY blocks are pretty expensive.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where is it documented what role executes constraint triggers?

2017-11-03 Thread Tom Lane
Chapman Flack <c...@anastigmatix.net> writes:
> From a little experimenting in 9.5, it seems that a referential
> integrity trigger is executed with the identity of the referencED
> table's owner, but I have not been able to find this covered in
> the docs. Is this a documentation oversight, or is it explained
> somewhere I didn't look (or may have skimmed right over it)?

Don't know if it's documented anywhere user-facing, but a look into
the code in ri_triggers.c says we run RI queries as the owner of
whichever table the query touches (the referenced table for verification
of FK inserts, the referencing table when cascading a PK change).

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some const decorations to prototypes

2017-11-03 Thread Tom Lane
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> On 10/31/17 10:56, Tom Lane wrote:
>>> Some functions have a strtol()-like behavior
>>> where they take in a const char * and return a pointer into that as
>>> another argument.  In those cases, I added a cast or two.

>> ... but I'm not sure that it's an improvement in cases where you have to
>> cast away the const somewhere else.  I realize that strtol has an ancient
>> pedigree, but I do not think it's very good design.

> Would you prefer leaving the input argument as char *, or change the
> endptr argument to const as well?

Just leave it as char*.  If you change the endptr argument you're going to
force every call site to change their return variable, and some of them
would end up having to cast away the const on their end.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> Do we still need the complication in brinsummarize to discriminate
>> against the last partial range?  Now that the lock consideration
>> is gone, I think that might be a wart.

> In the case of VACUUM, it's not desirable to create a summarization for
> the last partial range, because if the table is still being filled, that
> would slow down the insertion process.

Hm.  Okay, but you should change the comment then, because "we do not want
to spend one RelationGetNumberOfBlocks call" is a pretty weak reason.

Also, I think I would accept that argument for autovacuum, but maybe
not so much for a manual vacuum.  Maybe you should drive it off
IsAutovacuumWorker rather than which operation is being done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-03 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Alvaro Herrera wrote:
>> Maybe a solution is to call RelationGetNumberOfBlocks() after the
>> placeholder tuple has been inserted, for the case where we would be
>> scanning past end of relation; passing InvalidBlockNumber as stop point
>> would indicate to do things that way.  I'll try with that approach now.

> Yeah, I think this approach results in better code.  The attached patch
> implements that, and it passes the test for me (incl. calling
> brin_summarize_new_values concurrently with vacuum, when running the
> insert; the former does include the final page range whereas the latter
> does not.)

Hm, so IIUC the point is that once the placeholder tuple is in, we can
rely on concurrent inserters to update it for insertions into pages that
are added after we determine our scan stop point.  But if the scan stop
point is chosen before inserting the placeholder, then we have a race
condition.

The given code seems a brick or so short of a load, though: if the table
has been extended sufficiently, it could compute scanNumBlks as larger
than bs_pagesPerRange, no?  You need to clamp the computation result.
Also, shouldn't the passed-in heapBlk always be a multiple of
pagesPerRange already?

Do we still need the complication in brinsummarize to discriminate
against the last partial range?  Now that the lock consideration
is gone, I think that might be a wart.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small improvement to compactify_tuples

2017-11-02 Thread Tom Lane
Sokolov Yura <funny.fal...@postgrespro.ru> writes:
> [ 0001-Improve-compactify_tuples.patch, v5 or thereabouts ]

I started to review this patch.  I spent a fair amount of time on
beautifying the code, because I found it rather ugly and drastically
undercommented.  Once I had it to the point where it seemed readable,
I went to check the shellsort algorithm against Wikipedia's entry,
and found that this appears to be an incorrect implementation of
shellsort: where pg_shell_sort_pass has

for (_i = off; _i < _n; _i += off) \

it seems to me that we need to have

for (_i = off; _i < _n; _i += 1) \

or maybe just _i++.  As-is, this isn't h-sorting the whole file,
but just the subset of entries that have multiple-of-h indexes
(ie, the first of the h distinct subfiles that should get sorted).
The bug is masked by the final pass of plain insertion sort, but
we are not getting the benefit we should get from the earlier passes.

However, I'm a bit dubious that it's worth fixing that; instead
my inclination would be to rip out the shellsort implementation
entirely.  The code is only using it for the nitems <= 48 case
(which makes the first three offset steps certainly no-ops) and
I am really unconvinced that it's worth expending the code space
for a shellsort rather than plain insertion sort in that case,
especially when we have good reason to think that the input data
is nearly sorted.

BTW, the originally given test case shows no measurable improvement
on my box.  I was eventually able to convince myself by profiling
that the patch makes us spend less time in compactify_tuples, but
this test case isn't a very convincing one.

So, quite aside from the bug, I'm not excited about committing the
attached as-is.  I think we should remove pg_shell_sort and just
use pg_insertion_sort.  If somebody can show a test case that
provides a measurable speed improvement from the extra code,
I could be persuaded to reconsider.

I also wonder if the nitems <= 48 cutoff needs to be reconsidered
in light of this.  But since I can hardly measure any benefit from
the patch at all, I'm not in the best position to test different
values for that cutoff.

Have not looked at the 0002 patch yet.

    regards, tom lane

diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 41642eb..1af1b85 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
***
*** 18,23 
--- 18,24 
  #include "access/itup.h"
  #include "access/xlog.h"
  #include "storage/checksum.h"
+ #include "utils/inline_sort.h"
  #include "utils/memdebug.h"
  #include "utils/memutils.h"
  
*** typedef struct itemIdSortData
*** 425,439 
  } itemIdSortData;
  typedef itemIdSortData *itemIdSort;
  
! static int
  itemoffcompare(const void *itemidp1, const void *itemidp2)
  {
- 	/* Sort in decreasing itemoff order */
  	return ((itemIdSort) itemidp2)->itemoff -
  		((itemIdSort) itemidp1)->itemoff;
  }
  
  /*
   * After removing or marking some line pointers unused, move the tuples to
   * remove the gaps caused by the removed items.
   */
--- 426,542 
  } itemIdSortData;
  typedef itemIdSortData *itemIdSort;
  
! /* Comparator for sorting in decreasing itemoff order */
! static inline int
  itemoffcompare(const void *itemidp1, const void *itemidp2)
  {
  	return ((itemIdSort) itemidp2)->itemoff -
  		((itemIdSort) itemidp1)->itemoff;
  }
  
  /*
+  * Sort an array of itemIdSort's on itemoff, descending.
+  *
+  * This uses Shell sort.  Given that array is small and itemoffcompare
+  * can be inlined, it is much faster than general-purpose qsort.
+  */
+ static void
+ sort_itemIds_small(itemIdSort itemidbase, int nitems)
+ {
+ 	pg_shell_sort(itemIdSortData, itemidbase, nitems, itemoffcompare);
+ }
+ 
+ /*
+  * Sort an array of itemIdSort's on itemoff, descending.
+  *
+  * This uses bucket sort:
+  * - single pass of stable prefix sort on high 8 bits of itemoffs
+  * - then insertion sort on buckets larger than 1 element
+  */
+ static void
+ sort_itemIds(itemIdSort itemidbase, int nitems)
+ {
+ 	/* number of buckets to use: */
+ #define NSPLIT 256
+ 	/* divisor to scale input values into 0..NSPLIT-1: */
+ #define PREFDIV (BLCKSZ / NSPLIT)
+ 	/* per-bucket counts; we need two extra elements, see below */
+ 	uint16		count[NSPLIT + 2];
+ 	itemIdSortData copy[Max(MaxIndexTuplesPerPage, MaxHeapTuplesPerPage)];
+ 	int			i,
+ max,
+ total,
+ pos,
+ highbits;
+ 
+ 	Assert(nitems <= lengthof(copy));
+ 
+ 	/*
+ 	 * Count how many items in each bucket.  We assume all itemoff values are
+ 	 * less than BLCKSZ, therefore dividing by PREFDIV gives a value less than
+ 	 * NSPLIT.
+ 	 */
+ 	memset(count, 0, sizeof(count));
+ 	for (i = 0; i < nitems; i++)
+ 	{
+ 		highbits = itemidbase[i].itemoff / PREFDIV;
+ 		cou

Re: [HACKERS] pgbench - use enum for meta commands

2017-11-02 Thread Tom Lane
Fabien COELHO <coe...@cri.ensmp.fr> writes:
> [ pgbench-enum-meta-2.patch ]

Pushed with some trivial cosmetic adjustments (pgindent changed
it more than I did).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> So what would happen if we just don't summarize partial ranges?

> Index scan would always have to read all the heap pages for that partial
> range.  Maybe not a big issue, but when you finish loading a table, it'd
> be good to have a mechanism to summarize that partial range ...

I guess if the ranges are big this might not be nice.

> Rather than remove the capability, I'd be inclined to make
> brin_summarize_new_values summarize the final partial range, and have
> VACUUM not do it.  Would that be too inconsistent?

That doesn't really get you out of the problem that this is an abuse of
the relation extension lock, and is likely to cause issues when people
try to optimize that locking mechanism.

Why is it that the regular technique doesn't work, ie create a placeholder
tuple and let it get added to by any insertions that happen?

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-11-02 Thread Tom Lane
Amit Langote <langote_amit...@lab.ntt.co.jp> writes:
> On 2017/09/26 16:30, Michael Paquier wrote:
>> Cool, let's switch it back to a ready for committer status then.

> Sure, thanks.

Pushed with some cosmetic adjustments --- mostly, making the comments more
explicit about why we need the apparently-redundant assignments to
pd_lower.

I've marked the CF entry closed.  However, I'm not sure if we're quite
done with this thread.  Weren't we going to adjust nbtree and hash to
be more aggressive about labeling their metapages as REGBUF_STANDARD?

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-11-02 Thread Tom Lane
Haribabu Kommi <kommi.harib...@gmail.com> writes:
> The changes are fine and now it reports proper live tuples in both
> pg_class and stats. The other issue of continuous vacuum operation
> leading to decrease of number of live tuples is not related to this
> patch and that can be handled separately.

I did not like this patch too much, because it did nothing to fix
the underlying problem of confusion between "live tuples" and "total
tuples"; in fact, it made that worse, because now the comment on
LVRelStats.new_rel_tuples is a lie.  And there's at least one usage
of that field value where I think we do want total tuples not live
tuples: where we pass it to index AM cleanup functions.  Indexes
don't really care whether heap entries are live or not, only that
they're there.  Plus the VACUUM VERBOSE log message that uses the
field is supposed to be reporting total tuples not live tuples.

I hacked up the patch trying to make that better, finding along the
way that contrib/pgstattuple shared in the confusion about what
it was trying to count.  Results attached.

However, I'm not sure we're there yet, because there remains a fairly
nasty discrepancy even once we've gotten everyone onto the same page
about reltuples counting just live tuples: VACUUM and ANALYZE have
different definitions of what's "live".  In particular they do not treat
INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same.  Should we
try to do something about that?  If so, what?  It looks like ANALYZE's
behavior is pretty tightly constrained, judging by the comments in
acquire_sample_rows.

Another problem is that it looks like CREATE INDEX will set reltuples
to the total number of heap entries it chose to index, because that
is what IndexBuildHeapScan counts.  Maybe we should adjust that?

regards, tom lane

diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 5bf0613..68211c6 100644
*** a/contrib/pgstattuple/pgstatapprox.c
--- b/contrib/pgstattuple/pgstatapprox.c
*** statapprox_heap(Relation rel, output_typ
*** 68,74 
  	Buffer		vmbuffer = InvalidBuffer;
  	BufferAccessStrategy bstrategy;
  	TransactionId OldestXmin;
- 	uint64		misc_count = 0;
  
  	OldestXmin = GetOldestXmin(rel, PROCARRAY_FLAGS_VACUUM);
  	bstrategy = GetAccessStrategy(BAS_BULKREAD);
--- 68,73 
*** statapprox_heap(Relation rel, output_typ
*** 153,177 
  			tuple.t_tableOid = RelationGetRelid(rel);
  
  			/*
! 			 * We count live and dead tuples, but we also need to add up
! 			 * others in order to feed vac_estimate_reltuples.
  			 */
  			switch (HeapTupleSatisfiesVacuum(, OldestXmin, buf))
  			{
- case HEAPTUPLE_RECENTLY_DEAD:
- 	misc_count++;
- 	/* Fall through */
- case HEAPTUPLE_DEAD:
- 	stat->dead_tuple_len += tuple.t_len;
- 	stat->dead_tuple_count++;
- 	break;
  case HEAPTUPLE_LIVE:
  	stat->tuple_len += tuple.t_len;
  	stat->tuple_count++;
  	break;
! case HEAPTUPLE_INSERT_IN_PROGRESS:
! case HEAPTUPLE_DELETE_IN_PROGRESS:
! 	misc_count++;
  	break;
  default:
  	elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
--- 152,172 
  			tuple.t_tableOid = RelationGetRelid(rel);
  
  			/*
! 			 * We follow VACUUM's lead in counting INSERT_IN_PROGRESS and
! 			 * DELETE_IN_PROGRESS tuples as "live".
  			 */
  			switch (HeapTupleSatisfiesVacuum(, OldestXmin, buf))
  			{
  case HEAPTUPLE_LIVE:
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
  	stat->tuple_len += tuple.t_len;
  	stat->tuple_count++;
  	break;
! case HEAPTUPLE_DEAD:
! case HEAPTUPLE_RECENTLY_DEAD:
! 	stat->dead_tuple_len += tuple.t_len;
! 	stat->dead_tuple_count++;
  	break;
  default:
  	elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
*** statapprox_heap(Relation rel, output_typ
*** 184,191 
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
  	stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
! 			   stat->tuple_count + misc_count);
  
  	/*
  	 * Calculate percentages if the relation has one or more pages.
--- 179,190 
  
  	stat->table_len = (uint64) nblocks * BLCKSZ;
  
+ 	/*
+ 	 * Extrapolate the live-tuple count to the whole table in the same way
+ 	 * that VACUUM does.  (XXX shouldn't we also extrapolate tuple_len?)
+ 	 */
  	stat->tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
! 			   stat->tuple_count);
  
  	/*
  	 * Calculate percentages if the relation has one or more pages.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ef60a58..87bba31 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*** SCRAM-SHA-256$iteration
*** 1739,1746 
float4

   

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> If VACUUM and brin_summarize_new_values both ignore the partial
>> range, then what else would request this?  Can't we just decree
>> that we don't summarize the partial range, period?

> brin_summarize_range() can do it.

So what would happen if we just don't summarize partial ranges?

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> 1. in VACUUM or brin_summarize_new_values, we only process fully loaded
> ranges, and ignore the partial range at end of table.

OK.

> 2. when summarization is requested on the partial range at the end of a
> table, we acquire extension lock on the rel, then compute relation size
> and run summarization with the lock held.  This guarantees that we don't
> miss any pages.  This is bad for concurrency though, so it's only done
> in that specific scenario.

Hm, I wonder how this will play with the active proposals around
reimplementing relation extension locks.  All that work seems to be
assuming that the extension lock is only held for a short time and
nothing much beyond physical extension is done while holding it.
I'm afraid that you may be introducing a risk of e.g. deadlocks
if you do this.

If VACUUM and brin_summarize_new_values both ignore the partial
range, then what else would request this?  Can't we just decree
that we don't summarize the partial range, period?

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> Where are we on this --- do you want me to push the brin_doupdate
>> fix I proposed, or were you intending to merge that into a
>> larger patch?

> Please push your fixes, I'll post my proposed patch for the other bug
> afterwards; they are unrelated problems after all.

OK, I was not sure if you considered them related or not.  Will push.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-11-02 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
>> Hmm, I'm pretty sure we stress-tested brin in pretty much the same way.
>> But I see this misbehavior too.  Looking ...

> Turns out that this is related to concurrent growth of the table while
> the summarization process is scanning -- so new pages have appeared at
> the end of the table after the end point has been determined.  It would
> be a pain to determine number of blocks for each range, so I'm looking
> for a simple way to fix it without imposing so much overhead.

Where are we on this --- do you want me to push the brin_doupdate
fix I proposed, or were you intending to merge that into a
larger patch?  If I'm to do it, is there a reason not to back-patch
to all branches with BRIN?

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Tom Lane
Nico Williams <n...@cryptonector.com> writes:
> With access controls, GUCs could become schema variables, and settings
> from postgresql.conf could move into the database itself (which I think
> would be nice).

People re-propose some variant of that every so often, but it never works,
because it ignores the fact that some of the GUCs' values are needed
before you can access system catalogs at all, or in places where relying
on system catalog access would be a bad idea.

Sure, we could have two completely different configuration mechanisms
so that some of the variables could be "inside the database", but that
doesn't seem like a net improvement to me.  The point of the Grand Unified
Configuration mechanism was to be unified, after all.

I'm on board with having a totally different mechanism for session
variables.  The fact that people have been abusing GUC to store
user-defined variables doesn't make it a good way to do that.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-02 Thread Tom Lane
David Rowley <david.row...@2ndquadrant.com> writes:
>> It seems to me that a whole lot of the complaints about this could be
>> resolved simply by improving the List infrastructure to allocate ListCells
>> in batches.  That would address the question of "too much palloc traffic"
>> and greatly improve the it-accesses-all-over-memory situation too.
>> 
>> Possibly there are more aggressive changes that could be implemented
>> without breaking too many places, but I think it would be useful to
>> start there and see what it buys us.

> Yeah, certainly would be a good way of determining the worth of changing.

BTW, with just a little more work that could be made to address the
list-nth-is-expensive problem too.  I'm imagining a call that preallocates
an empty list with room for N ListCells (or perhaps, if we need to
preserve compatibility with NIL == NULL, creates a single-element list
with room for N-1 more cells), plus some tracking in list.c of whether
the list cells have been consumed in order.  Given the typical use-case of
building lists strictly with lappend, list_nth() could index directly into
the ListCell slab as long as it saw the List header's "is_linear" flag was
true.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-02 Thread Tom Lane
David Rowley <david.row...@2ndquadrant.com> writes:
> On 3 November 2017 at 03:17, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> We've jacked up the List API and driven a new implementation underneath
>> once before.  Maybe it's time to do that again.

> Maybe, but the new implementation is not going to do well with places
> where we perform lcons(). Probably many of those places could be
> changed to lappend(), but I bet there's plenty that need prepend.

[ shrug... ] To me, that means this implementation isn't necessarily
the right solution.

It seems to me that a whole lot of the complaints about this could be
resolved simply by improving the List infrastructure to allocate ListCells
in batches.  That would address the question of "too much palloc traffic"
and greatly improve the it-accesses-all-over-memory situation too.

Possibly there are more aggressive changes that could be implemented
without breaking too many places, but I think it would be useful to
start there and see what it buys us.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ArrayLists instead of List (for some things)

2017-11-02 Thread Tom Lane
David Rowley <david.row...@2ndquadrant.com> writes:
> Comments on the design are welcome, but I was too late to the
> commitfest, so there are other priorities. However, if you have a
> strong opinion, feel free to voice it.

I do not like replacing Lists piecemeal; that's a recipe for ongoing
API breakage and back-patching pain.  Plus we'll then have *four*
different linked-list implementations in the backend, which sure
seems like too many.

We've jacked up the List API and driven a new implementation underneath
once before.  Maybe it's time to do that again.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Tom Lane
Andres Freund <and...@anarazel.de> writes:
> Do we care about people upgrading to unreleased versions? We could do
> nothing, document it in the release notes, or ???

Do nothing.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-11-01 Thread Tom Lane
Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru> writes:
> On 04.09.2017 15:17, Alexey Chernyshov wrote:
>> make check-world fails on contrib/postgres_fdw because of Subquery Scan on 
>> ... Probably, query plan was changed.

> Thanks for testing! This is the same problem as the one in 
> 'select_distinct' I mentioned before. I changed the test, the updated 
> patch is attached.

I've pushed the executor part of this, but mostly not the planner part,
because I didn't think the latter was anywhere near ready for prime
time: the regression test changes it was causing were entirely bogus.

You had basically two categories of plan changes showing up in the
regression tests.  One was insertion of Subquery Scan nodes where
they hadn't been before; that was because the use_physical_tlist
change broke the optimization that removed no-op Subquery Scans.
I fixed that by narrowing the use_physical_tlist change to apply
only to BitmapHeapPath nodes, which is the only case where there
would be any benefit anyway.  The remaining plan diffs after making
that change all amounted to replacing regular index-only scan plans
with bitmap scans, which seems to me to be silly on its face: if we
can use an IOS then it is unlikely that a bitmap scan will be better.
Those changes indicate that the cost adjustment you'd inserted in
cost_bitmap_heap_scan was way too optimistic, which is hardly
surprising.  I think a realistic adjustment would have to account
for all or most of these factors:

* Whether the index AM is ever going to return recheck = false.
The planner has no way to know that at present, but since there are
lots of cases where it simply won't ever happen, I think assuming
that it always will is not acceptable.  Perhaps we could extend the
AM API so that we could find out whether recheck would happen always,
never, or sometimes.  (Doing better than "sometimes" might be too hard,
but I think most opclasses are going to be "always" or "never" anyway.)

* Whether the bitmap will become lossy, causing us to have to make
rechecks anyway.  We could probably estimate that pretty well based
on comparing the initial number-of-pages estimate to work_mem.

* Whether the plan will need to fetch heap tuples to make filter-qual
checks.  In principle the planner ought to know that.  In practice,
right now it doesn't bother to figure out whether the qual will be
empty until createplan.c time, because that's rather a significant
amount of work and it's undesirable to expend it for paths we might
not end up using.  We might be able to approximate it better than
we do now, though.  It's a live problem for regular indexscan costing
as well as bitmap scans, IIRC.

* What fraction of the table is actually all-visible.  You'd effectively
hard-wired that at 50%, but it's easy to do better --- the regular
IOS code does

if (indexonly)
pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));

and it would be appropriate to do the same here if we conclude that
the other gating conditions apply.

Without a patch that deals more realistically with these concerns,
I think we're better off not changing the cost estimate at all.

As far as the executor side goes, I made several cosmetic changes
and one not so cosmetic one: I changed the decision about whether
to prefetch so that it looks at whether the potential prefetch
page is all-visible.  This still relies on the same assumption you
had that the recheck flag will stay the same from one page to the
next, but at least it's not assuming that the all-visible state
will stay the same.

I'm going to mark the CF entry closed, but if you feel like working
on the cost estimate business, feel free to submit another patch
for that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER COLUMN TYPE vs. domain constraints

2017-11-01 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Fri, Oct 27, 2017 at 11:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> We could consider back-patching the attached to cover this, but
>> I'm not entirely sure it's worth the trouble, because I haven't
>> thought of any non-silly use-cases in the absence of domains
>> over composite.  Comments?

> There are no real complaints about the current behavior, aren't there?
> So only patching HEAD seems enough to me.

Yeah, we can leave it till someone does complain.

> You have added a comment on the constraint to make sure that it
> remains up on basically this ALTER TYPE. Querying pg_obj_description
> would make sure that the comment on the constraint is still here.

Done.

> +RebuildDomainConstraintComment(AlteredTableInfo *tab, int pass, Oid objid,
> +  List *domname, char *conname)
> There is much duplication with RebuildConstraintComment. Why not
> grouping both under say RebuildObjectComment()?

True.  I'd originally expected the code to differ more, but we can
merge these easily enough.

> I would think about
> having cmd->objtype and cmd->object passed as arguments, and then
> remove rel and domname from the existing arguments.

Doing it like that requires the callers to do work (to prepare the object
ID data structure) that's wasted if there's no comment, which most often
there wouldn't be, I suspect.  Seems better to just pass the info the
caller does have and let this function do the rest.

Pushed, thanks for reviewing!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-11-01 Thread Tom Lane
Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes:
> On Wed, Nov 1, 2017 at 3:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> here's a patch to fix the planner so that eval costs and selectivity of
>> HAVING quals are factored into the appropriate plan node numbers.
>> ...
>> + /* Add cost of qual, if any --- but we ignore its selectivity */

> And may be we should try to explain why can we ignore selectivity.
> Similarly for the changes in create_minmaxagg_path().

I'm sure you realize that's because the estimate is already just one
row ... but sure, we can spell that out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange relcache.c debug message

2017-11-01 Thread Tom Lane
Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
> While messing with BRIN bugs, I noticed this debug message in the server
> log:
> 2017-11-01 12:33:24.042 CET [361429] DEBUG:  rehashing catalog cache id 14 
> for pg_opclass; 17 tups, 8 buckets en carácter 194
> notice that at the end it says "at character 194".  I suppose that's
> because of some leftover errposition() value, but why is it being
> reported in this message?

IIRC, there are places in the parser that set up an error callback that
will attach an errposition to any message at all that gets reported within
a particular chunk of code.  It's meant to catch something like "name not
found" but could produce this too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tom Lane
"David G. Johnston" <david.g.johns...@gmail.com> writes:
> On Tue, Oct 31, 2017 at 4:31 PM, Tels <nospam-pg-ab...@bloodgate.com> wrote:
>> That looks odd to me, it first uses output_tuples in a formula, then
>> overwrites the value with a new value. Should these lines be swapped?

> ​IIUC it is correct: the additional total_cost comes from processing every
> output group to check whether it is qualified - since every group is
> checked the incoming output_tuples from the prior grouping is used.

Right --- we'll expend the effort to compute the HAVING expression once
per group row, whether the row passes the qual or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread Tom Lane
"David G. Johnston" <david.g.johns...@gmail.com> writes:
> ​Definitely moderates my opinion in my concurrent emai​l...though
> postponement is not strictly bad given the seeming frequency of the
> existing problematic syntax in the wild already.

Yeah, I'd hoped to get some capability extensions done here before v10
shipped, in line with the theory I've expressed in the past that it's
better if you can point to actual new features justifying a compatibility
break.  However, that didn't happen in time.

I'm disinclined to revert the change though; if there are people making
use of this oddity now, then the longer we leave it in place the more
people are going to be hurt when we do break it.

If I had a time machine, I'd go back and fix the original multi-column
SET patch so that it required the word ROW in all cases --- then at
least it'd have accepted a conformant subset of the standard syntax.
Oh well.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 parenthesized single-column updates can produce errors

2017-10-31 Thread Tom Lane
Rob McColl <r...@robmccoll.com> writes:
> Attaching patch... :-/

The reason why hacking your way to a backwards-compatible solution is
a bad idea is that it breaks the SQL standard compliance we're trying to
achieve here.  According to the spec, the elements of a parenthesized
SET list should be assigned from the fields of a composite RHS.  If
there's just one element of the SET list, the RHS should be a single-field
composite value, and this syntax should result in extracting its field.
This patch doesn't do that; it preserves our previous broken behavior,
and thus just puts off the day of reckoning that inevitably will come.

As a concrete example, the spec says this should work:

create table t (f1 int);
update t set (f1) = row(4);

and it does in v10, but (without having tested) your patch will break it.
This is not so exciting for simple row constructors, where you could just
leave off the word ROW; but it is a critical distinction if we're ever to
get to the point of allowing other composite-returning constructs here,
for example composite-returning functions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Account for cost and selectivity of HAVING quals

2017-10-31 Thread Tom Lane
Pursuant to the discussion at
https://www.postgresql.org/message-id/20171029112420.8920b5f...@mx.zeyos.com
here's a patch to fix the planner so that eval costs and selectivity of
HAVING quals are factored into the appropriate plan node numbers.
Perhaps unsurprisingly, this doesn't change the results of any
existing regression tests.

It's slightly annoying that this approach will result in calculating
the eval costs and selectivity several times, once for each aggregation
plan type we consider.  I thought about inserting RestrictInfo nodes
into the havingQual so that those numbers could be cached, but that turned
out to break various code that doesn't expect to see such nodes there.
I'm not sure it's worth going to the trouble of fixing that; in the big
scheme of things, the redundant calculations likely don't cost much, since
we aren't going to have relevant statistics.

Comments?  If anyone wants to do a real review of this, I'm happy to stick
it into the upcoming CF; but without an expression of interest, I'll just
push it.  I don't think there's anything terribly controversial here.

regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a..98fb16e 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*** void
*** 1874,1879 
--- 1874,1880 
  cost_agg(Path *path, PlannerInfo *root,
  		 AggStrategy aggstrategy, const AggClauseCosts *aggcosts,
  		 int numGroupCols, double numGroups,
+ 		 List *quals,
  		 Cost input_startup_cost, Cost input_total_cost,
  		 double input_tuples)
  {
*** cost_agg(Path *path, PlannerInfo *root,
*** 1955,1960 
--- 1956,1981 
  		output_tuples = numGroups;
  	}
  
+ 	/*
+ 	 * If there are quals (HAVING quals), account for their cost and
+ 	 * selectivity.
+ 	 */
+ 	if (quals)
+ 	{
+ 		QualCost	qual_cost;
+ 
+ 		cost_qual_eval(_cost, quals, root);
+ 		startup_cost += qual_cost.startup;
+ 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+ 
+ 		output_tuples = clamp_row_est(output_tuples *
+ 	  clauselist_selectivity(root,
+ 			 quals,
+ 			 0,
+ 			 JOIN_INNER,
+ 			 NULL));
+ 	}
+ 
  	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
*** cost_windowagg(Path *path, PlannerInfo *
*** 2040,2051 
--- 2061,2075 
  void
  cost_group(Path *path, PlannerInfo *root,
  		   int numGroupCols, double numGroups,
+ 		   List *quals,
  		   Cost input_startup_cost, Cost input_total_cost,
  		   double input_tuples)
  {
+ 	double		output_tuples;
  	Cost		startup_cost;
  	Cost		total_cost;
  
+ 	output_tuples = numGroups;
  	startup_cost = input_startup_cost;
  	total_cost = input_total_cost;
  
*** cost_group(Path *path, PlannerInfo *root
*** 2055,2061 
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	path->rows = numGroups;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
--- 2079,2105 
  	 */
  	total_cost += cpu_operator_cost * input_tuples * numGroupCols;
  
! 	/*
! 	 * If there are quals (HAVING quals), account for their cost and
! 	 * selectivity.
! 	 */
! 	if (quals)
! 	{
! 		QualCost	qual_cost;
! 
! 		cost_qual_eval(_cost, quals, root);
! 		startup_cost += qual_cost.startup;
! 		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
! 
! 		output_tuples = clamp_row_est(output_tuples *
! 	  clauselist_selectivity(root,
! 			 quals,
! 			 0,
! 			 JOIN_INNER,
! 			 NULL));
! 	}
! 
! 	path->rows = output_tuples;
  	path->startup_cost = startup_cost;
  	path->total_cost = total_cost;
  }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c84a2c..f620243 100644
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
*** choose_hashed_setop(PlannerInfo *root, L
*** 977,982 
--- 977,983 
  	 */
  	cost_agg(_p, root, AGG_HASHED, NULL,
  			 numGroupCols, dNumGroups,
+ 			 NIL,
  			 input_path->startup_cost, input_path->total_cost,
  			 input_path->rows);
  
*** choose_hashed_setop(PlannerInfo *root, L
*** 991,996 
--- 992,998 
  			  input_path->rows, input_path->pathtarget->width,
  			  0.0, work_mem, -1.0);
  	cost_group(_p, root, numGroupCols, dNumGroups,
+ 			   NIL,
  			   sorted_p.startup_cost, sorted_p.total_cost,
  			   input_path->rows);
  
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..eafec2a 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*** create_result_path(PlannerInfo *root, Re
*** 1374,1379 
--- 1374,1380 
  	pa

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
I wrote:
> maybe
> we just have some run-of-the-mill bugs to find, like the off-the-end
> bug I spotted in brin_doupdate.  There's apparently at least one
> more, but given the error message it must be something like not
> checking for a page to have turned into a revmap page.  Shouldn't
> be too hard to find...

Actually, I think it might be as simple as the attached.
brin_getinsertbuffer checks for the old page having turned into revmap,
but the "samepage" path in brin_doupdate does not :-(

With this applied, Alvaro's version of the test case has survived
without error for quite a bit longer than its former MTBF.  There
might still be some issues though in other code paths.

        regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..b0f86f3 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 113,121 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely ...
  	 */
! 	if (!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  
--- 113,127 
  
  	/*
  	 * Check that the old tuple wasn't updated concurrently: it might have
! 	 * moved someplace else entirely, and for that matter the whole page
! 	 * might've become a revmap page.  Note that in the first two cases
! 	 * checked here, the "oldlp" we just calculated is garbage; but
! 	 * PageGetItemId() is simple enough that it was safe to do that
! 	 * calculation anyway.
  	 */
! 	if (!BRIN_IS_REGULAR_PAGE(oldpage) ||
! 		oldoff > PageGetMaxOffsetNumber(oldpage) ||
! 		!ItemIdIsNormal(oldlp))
  	{
  		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> I really don't understand how any of this "let's release the buffer
>> lock and then take it back later" logic is supposed to work reliably.

> So summarize_range first inserts the placeholder tuple, which is there
> purposefully for other processes to update concurrently; then, without
> blocking any other process, scan the page range and update the
> placeholder tuple (this could take a long time, so it'd be a bad idea
> to hold buffer lock for that long).

Yeah, agreed, and your upthread point about avoiding deadlock when we
need to take two buffer locks at the same time is also something that
it's hard to see any other way around.  Maybe we just have to find a
way to make the existing structure work.  The sticking point is that
not only might the tuple we expected have been deleted, but someone
could have put an unrelated tuple in its place.  Are you confident
that you can detect that and recover from it?  If you're sure that
brin_tuples_equal() is trustworthy for this purpose, then maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate.  There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page.  Shouldn't
be too hard to find...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL procedures

2017-10-31 Thread Tom Lane
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> I've been working on SQL procedures.

No comment yet on the big picture here, but ...

> The provided procedural languages (an ever more
> confusing term) each needed a small touch-up to handle pg_proc entries
> with prorettype == 0.

Putting 0 in prorettype seems like a pretty bad idea.  Why not use VOIDOID
for the prorettype value?  Or if there is some reason why "void" isn't the
right pseudotype, maybe you should invent a new one, analogous to the
"trigger" and "event_trigger" pseudotypes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Tom Lane
Vitaly Burovoy <vitaly.buro...@gmail.com> writes:
> I left an other "NULL::name AS rolname" at
> src/bin/pg_dump/pg_dump.c:2978 because can't check (remoteVersion <
> 9) it and it is under strict "selectSourceSchema(fout,
> "pg_catalog");" schema set.

Yeah, there are quite a few unqualified casts in pg_dump, but AFAICS
all the rest are OK because the search_path is just pg_catalog.

But I did find psql's describe.c making a similar mistake :-(.
Pushed that along with your fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
lling RecordPageWithFreeSpace
while holding at least one and possibly two buffer locks.  Shouldn't
that be done someplace else?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix dumping pre-10 DBs by pg_dump10 if table "name" exists

2017-10-31 Thread Tom Lane
Vitaly Burovoy <vitaly.buro...@gmail.com> writes:
> Recently my colleagues found a bug.

> -   "SELECT 'bigint'::name AS 
> sequence_type, "
> +   "SELECT 
> 'bigint'::pg_catalog.name AS sequence_type, 

Good catch, but I think we could simplify this by just omitting the cast
altogether:

- "SELECT 'bigint'::name AS 
sequence_type, "
+ "SELECT 'bigint' AS 
sequence_type, 

pg_dump doesn't particularly care whether the column comes back marked
as 'name' or 'text' or 'unknown'.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some const decorations to prototypes

2017-10-31 Thread Tom Lane
Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> Here is a patch that adds const decorations to many char * arguments in
> functions.  It should have no impact otherwise; there are very few code
> changes caused by it.

+1 in general ...

> Some functions have a strtol()-like behavior
> where they take in a const char * and return a pointer into that as
> another argument.  In those cases, I added a cast or two.

... but I'm not sure that it's an improvement in cases where you have to
cast away the const somewhere else.  I realize that strtol has an ancient
pedigree, but I do not think it's very good design.

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query regarding permission on table_column%type access

2017-10-31 Thread Tom Lane
Stephen Frost <sfr...@snowman.net> writes:
> * Neha Sharma (neha.sha...@enterprisedb.com) wrote:
>> I have observed that even if the user does not have permission on a
>> table(created in by some other user),the function parameter still can have
>> a parameter of that table_column%type.

> This is because the creation of the table also creates a type of the
> same name and the type's permissions are independent of the table's.  I
> imagine that you could REVOKE USAGE ON TYPE from the type and deny
> access to that type if you wanted to.

Right.  (I checked, seems to work as expected.)

> I'm not sure that we should change the REVOKE on the table-level to also
> mean to REVOKE access to the type automatically (and what happens if you
> GRANT the access back for the table..?

It seems pretty silly for privileges on table rowtypes to behave
differently from those on other rowtypes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-31 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Yeah, we're still missing an understanding of why we didn't see it
>> before; the inadequate locking was surely there before.

> Because 24992c6d has added a check on the offset number by using
> PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
> tighter, no?

No, I don't see how it's tighter.  The old code matched supplied
offnum(s) against the indexes of not-unused items, and then after
that loop it complained if they weren't all matched.  So it should
also have failed, albeit with a different error message, if it were
passed an offnum corresponding to a no-longer-live tuple.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> So: I put the blame on the fact that summarize_range() thinks that
>> the tuple offset it has for the placeholder tuple is guaranteed to
>> hold good, even across possibly-long intervals where it's holding
>> no lock on the containing buffer.

> Yeah, I think this is a pretty reasonable explanation for the problem.
> I don't understand why it doesn't fail in 9.6.

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before.  I'm guessing
that somehow the previous behavior of PageIndexDeleteNoCompact managed
to mask the problem (perhaps only by not throwing an error, which doesn't
imply that the index state was good afterwards).  But I don't see quite
how it did that.

One thing I think we do know now is that the bug is triggered by two
concurrent executions of summarize_range.  So I'd look for edge cases
like whether the placeholder tuple can be deleted and then reinserted
into the same lp index.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
I wrote:
> Hmm.  The index offnum being complained of is one past the end of the
> lp array.  I think I see what about that commit changed the behavior:
> the old code for PageIndexDeleteNoCompact never changed the length
> of the lp array, except in the corner case where the page is becoming
> completely empty.  The new code will shorten the lp array (decrease
> phdr->pd_lower) if it's told to remove the last item.  So if you make
> two successive calls specifying the same offnum, and it's the last one
> on the page, the second one will fail with the symptoms we see here.
> However, so far as I can see, a sequence like that would have failed
> before too, just with a different error message, because once the
> first call had marked the item unused, the second call would not see
> it as a candidate to match.  So I'm not quite sure how that's related
> ... but it seems like it must be.

I'm still confused about why it didn't fail before, but after adding
some additional code to log each call of PageIndexTupleDeleteNoCompact,
I think I've got a smoking gun:

2017-10-30 18:18:44.321 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_c_idx page 2
2017-10-30 18:18:44.321 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:44.393 EDT [10932] LOG:  deleting tuple 292 (of 292) in rel 
brin_test_d_idx page 2
2017-10-30 18:18:44.393 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:18:53.428 EDT [10932] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 3
2017-10-30 18:18:53.428 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.794 EDT [10938] LOG:  deleting tuple 186 (of 186) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.794 EDT [10938] STATEMENT:  insert into brin_test select
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 mod(i,10)/25,
 md5((mod(i,10)/25)::text)::uuid
from generate_series(1,10) s(i)
2017-10-30 18:19:13.795 EDT [10932] LOG:  deleting tuple 186 (of 185) in rel 
brin_test_e_idx page 4
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test
2017-10-30 18:19:13.795 EDT [10932] PANIC:  invalid index offnum: 186
2017-10-30 18:19:13.795 EDT [10932] STATEMENT:  vacuum brin_test

So what happened here is that the inserting process decided to
summarize concurrently with the VACUUM process, and the inserting
process deleted (or maybe just updated/moved?) the placeholder tuple
that VACUUM was expecting to delete, and then we get the PANIC because
the tuple we're expecting to delete is already gone.

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Fixing this without creating new problems is beyond my familiarity
with the BRIN code.  It looks like it might be nontrivial :-(

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 80f803e..04ad804 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
--- 243,249 
  		if (extended)
  			brin_page_init(BufferGetPage(newbuf), BRIN_PAGETYPE_REGULAR);
  
! 		PageIndexTupleDeleteNoCompact(idxrel, oldbuf, oldpage, oldoff);
  		newoff = PageAddItem(newpage, (Item) newtup, newsz,
  			 InvalidOffsetNumber, false, false);
  		if (newoff == InvalidOffsetNumber)
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 22f2076..4d5dad3 100644
*** a/src/backend/access/brin/brin_revmap.c
--- b/src/backend/access/brin/brin_revmap.c
*** brinRevmapDesummarizeRange(Relation idxr
*** 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
--- 409,415 
  	ItemPointerSetInvalid();
  	brinSetHeapBlockItemptr(revmapBuf, revmap->rm_pagesPerRange, heapBlk,
  			invalidIptr);
! 	PageIndexTupleDeleteNoCompact(idxrel, regBuf, regPg, regOffset);
  	/* XXX record free space in FSM? */
  
  	MarkBufferDirty(regBuf);
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 60daa54..c8cc9f8 100644
*** a/src/backend/access/brin/brin_xlog.c
--- b/src/backend/access/brin/brin_xlog.c
*** brin_xlog_update(XLogRe

Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
> On 10/30/2017 09:03 PM, Tom Lane wrote:
>> [ scratches head ... ]  Not sure how that could've introduced any
>> problems?  Will look.

> Not sure, but I can confirm Michael's findings - I've been unable to
> reproduce the issue on 1a4be103a5 even after 20 minutes, and on
> 24992c6db9 it failed after only 2.

Hmm.  The index offnum being complained of is one past the end of the
lp array.  I think I see what about that commit changed the behavior:
the old code for PageIndexDeleteNoCompact never changed the length
of the lp array, except in the corner case where the page is becoming
completely empty.  The new code will shorten the lp array (decrease
phdr->pd_lower) if it's told to remove the last item.  So if you make
two successive calls specifying the same offnum, and it's the last one
on the page, the second one will fail with the symptoms we see here.
However, so far as I can see, a sequence like that would have failed
before too, just with a different error message, because once the
first call had marked the item unused, the second call would not see
it as a candidate to match.  So I'm not quite sure how that's related
... but it seems like it must be.

Anyway, my opinion ATM is that PageIndexTupleDeleteNoCompact is fine,
and what we're looking at is some kind of bug in summarize_range
or brin_doupdate, causing them to pass a offset beyond the end of
the page in some corner case.

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

2017-10-30 Thread Tom Lane
Michael Paquier <michael.paqu...@gmail.com> writes:
> b1328d78f is close enough, but per what I see that's actually not
> true. I have been able to reproduce the problem, which shows up within
> a window of 2-4 minutes. Still sometimes it can take way longer, and I
> spotted one test where it took 15 minutes to show up... So, bisecting
> with a test that looks for core files for 20 minutes, I am seeing that
> the following commit is actually at fault:

> commit 24992c6db9fd40f342db1f22747ec9e56483796d
> Author: Tom Lane <t...@sss.pgh.pa.us>
> Date:   Fri Sep 9 19:00:59 2016 -0400

> Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

[ scratches head ... ]  Not sure how that could've introduced any
problems?  Will look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Rewriting PL/Python's typeio code

2017-10-30 Thread Tom Lane
I started to work on teaching PL/Python about domains over composite,
and soon found that it was a can of worms.  Aside from the difficulty
of shoehorning that in with a minimal patch, there were pre-existing
problems.  I found that it didn't do arrays of domains right either
(ok, that's an oversight in my recent commit c12d570fa), and there
are assorted bugs that have been there much longer.  For instance, if
you return a composite type containing a domain, it fails to enforce
domain constraints on the type's field.  Also, if a transform function
is in use, it missed enforcing domain constraints on the result.
And in many places it missed checking domain constraints on null values,
because the plpy_typeio code simply wasn't called for Py_None.

Plus the code was really messy and duplicative, e.g. domain_check was
called in three different places ... which wasn't enough.  It also did
a lot of repetitive catalog lookups.

So, I ended up rewriting/refactoring pretty heavily.  The new idea
is to solve these problems by making heavier use of recursion between
plpy_typeio's conversion functions, and in particular to treat domains
as if they were containers.  So now there's exactly one place to call
domain_check, in a conversion function that has first recursed to do
conversion of the base type.  Nulls are treated more honestly, and
the SQL-to-Python functions are more careful about not leaking memory.
Also, I solved some of the repetitive catalog lookup problems by
making the code rely as much as possible on the typcache (which I think
didn't exist when this code originated).  I added a couple of small
features to typcache to help with that.

This is a fairly large amount of code churn, and it could stand testing
by someone who's more Python-savvy than I am.  So I'll stick it into
the upcoming commitfest as a separate item.

regards, tom lane

diff --git a/contrib/hstore_plpython/expected/hstore_plpython.out b/contrib/hstore_plpython/expected/hstore_plpython.out
index df49cd5..1ab5fee 100644
*** a/contrib/hstore_plpython/expected/hstore_plpython.out
--- b/contrib/hstore_plpython/expected/hstore_plpython.out
*** AS $$
*** 68,79 
  val = [{'a': 1, 'b': 'boo', 'c': None}, {'d': 2}]
  return val
  $$;
!  SELECT test2arr();
 test2arr   
  --
   {"\"a\"=>\"1\", \"b\"=>\"boo\", \"c\"=>NULL","\"d\"=>\"2\""}
  (1 row)
  
  -- test as part of prepare/execute
  CREATE FUNCTION test3() RETURNS void
  LANGUAGE plpythonu
--- 68,97 
  val = [{'a': 1, 'b': 'boo', 'c': None}, {'d': 2}]
  return val
  $$;
! SELECT test2arr();
 test2arr   
  --
   {"\"a\"=>\"1\", \"b\"=>\"boo\", \"c\"=>NULL","\"d\"=>\"2\""}
  (1 row)
  
+ -- test python -> domain over hstore
+ CREATE DOMAIN hstore_foo AS hstore CHECK(VALUE ? 'foo');
+ CREATE FUNCTION test2dom(fn text) RETURNS hstore_foo
+ LANGUAGE plpythonu
+ TRANSFORM FOR TYPE hstore
+ AS $$
+ return {'a': 1, fn: 'boo', 'c': None}
+ $$;
+ SELECT test2dom('foo');
+  test2dom  
+ ---
+  "a"=>"1", "c"=>NULL, "foo"=>"boo"
+ (1 row)
+ 
+ SELECT test2dom('bar');  -- fail
+ ERROR:  value for domain hstore_foo violates check constraint "hstore_foo_check"
+ CONTEXT:  while creating return value
+ PL/Python function "test2dom"
  -- test as part of prepare/execute
  CREATE FUNCTION test3() RETURNS void
  LANGUAGE plpythonu
diff --git a/contrib/hstore_plpython/sql/hstore_plpython.sql b/contrib/hstore_plpython/sql/hstore_plpython.sql
index 911bbd6..2c54ee6 100644
*** a/contrib/hstore_plpython/sql/hstore_plpython.sql
--- b/contrib/hstore_plpython/sql/hstore_plpython.sql
*** val = [{'a': 1, 'b': 'boo', 'c': None}, 
*** 60,66 
  return val
  $$;
  
!  SELECT test2arr();
  
  
  -- test as part of prepare/execute
--- 60,80 
  return val
  $$;
  
! SELECT test2arr();
! 
! 
! -- test python -> domain over hstore
! CREATE DOMAIN hstore_foo AS hstore CHECK(VALUE ? 'foo');
! 
! CREATE FUNCTION test2dom(fn text) RETURNS hstore_foo
! LANGUAGE plpythonu
! TRANSFORM FOR TYPE hstore
! AS $$
! return {'a': 1, fn: 'boo', 'c': None}
! $$;
! 
! SELECT test2dom('foo');
! SELECT test2dom('bar');  -- fail
  
  
  -- test as part of prepare/execute
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 7aadc5d..f6450c4 100644
*** a/src/backend/utils/cache/typcache.c
--- b/src/backend/utils/cache/typcache.c
*** lookup_type_cache(Oid type_id, int

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Tom Lane
Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> I think Connor struggles to implement just an operator class.  Advising him
> to implement an index access method is a good way to get him away of
> PostgreSQL hacking for a long time :)

Yeah.  To answer the question a bit more directly: there are not any
contrib modules that add SP-GiST opclasses, but there are some that add
GiST or GIN opclasses, so any one of those would serve as a model for the
basic mechanism of writing an extension.  Just replace the AM-specific
support functions for those AMs with the ones SP-GiST uses.  (You can crib
some code details from the in-core SP-GiST support functions.)

        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> I was mostly just thinking out loud, listing another option rather
> than advocating for it.

FWIW, I just wanted the question to be debated and resolved properly.

After rereading the thread Andres pointed to, I thought of a hazard
that I think Andres alluded to, but didn't spell out explicitly:
if we can't read the primary checkpoint, and then back up to a
secondary one and replay as much of WAL as we can read, we may well
be left with an inconsistent database.  This would happen because
some changes that post-date the part of WAL we could read may have
been flushed to disk, if the system previously thought that WAL
up through the primary checkpoint was all safely down to disk.
Therefore, backing up to the secondary checkpoint is *not* a safe
automatic recovery choice, and it's dubious that it's even a useful
approach for manual recovery.  You're really into restore-from-
backup territory at that point.

I'm content now that removing the secondary checkpoint is an OK
decision.  (This isn't a review of Simon's patch, though.)

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-27 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> 
> wrote:
>> I noticed that RelationBuildPartitionKey is generating a partition key
>> in a temp context, then creating a private context and copying the key
>> into that.  That seems leftover from some previous iteration of some
>> other patch; I think it's pretty reasonable to create the new context
>> right from the start and allocate the key there directly instead.  Then
>> there's no need for copy_partition_key at all.

> We could do that, but the motivation for the current system was to
> avoid leaking memory in a long-lived context.  I think the same
> technique is used elsewhere for similar reasons.  I admit I haven't
> checked whether there would actually be a leak here if we did it as
> you propose, but I wouldn't find it at all surprising.

Another key point is to avoid leaving a corrupted relcache entry behind
if you fail partway through.  I think that the current coding is
RelationBuildPartitionKey is safe, although it's far too undercommented
for my taste.  (The ordering of the last few steps is *critical*.)

It might work to build the new key in a context that's initially a
child of CurrentMemoryContext, then reparent it to be a child of
CacheMemoryContext when done.  But you'd have to look at whether that
would end up wasting long-lived memory, if the build process generates
any cruft in the same context.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan for cube and seg

2017-10-27 Thread Tom Lane
Robert Haas <robertmh...@gmail.com> writes:
> On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin <x4...@yandex-team.ru> wrote:
>> For cube there is new default opclass.

> I seem to recall that changing the default opclass causes unsolvable
> problems with upgrades.  You might want to check the archives for
> previous discussions of this issue; unfortunately, I don't recall the
> details off-hand.

Quite aside from that, replacing the opclass with a new one creates
user-visible headaches that I don't think are justified, i.e. having to
reconstruct indexes in order to get the benefit.

Maybe I'm missing something, but ISTM you could just drop the compress
function and call it good.  This would mean that an IOS scan would
sometimes hand back a toast-compressed value, but what's the problem
with that?

(The only reason for making a decompress function that just detoasts
is that your other support functions then do not have to consider
the possibility that they're handed a toast-compressed value.  I have
not checked whether that really matters for cube.)

    regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ALTER COLUMN TYPE vs. domain constraints

2017-10-27 Thread Tom Lane
I found out that altering a column's type does not play nicely with
domain constraints: tablecmds.c expects that only table constraints
could depend on a column.  Now, it's easy to hit that with domains
over composite, so I propose to fix it in HEAD with the attached
patch.  However, if you really work at it, you can make it fail
in the back branches too:

regression=# create type comptype as (r float8, i float8);
CREATE TYPE
regression=# create domain silly as float8 check ((row(value,0)::comptype).r > 
0);
CREATE DOMAIN
regression=# alter type comptype alter attribute r type varchar;
ERROR:  cache lookup failed for relation 0

Before commit 6784d7a1d, the ALTER actually went through, leaving a
mess.  Fortunately it doesn't actually crash afterwards, but you
get things like

regression=# select 0::silly;
ERROR:  ROW() column has type double precision instead of type character varying

We could consider back-patching the attached to cover this, but
I'm not entirely sure it's worth the trouble, because I haven't
thought of any non-silly use-cases in the absence of domains
over composite.  Comments?

regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3ab8087..fc93c4e 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
*** static void ATPostAlterTypeParse(Oid old
*** 426,431 
--- 426,433 
  	 bool rewrite);
  static void RebuildConstraintComment(AlteredTableInfo *tab, int pass,
  		 Oid objid, Relation rel, char *conname);
+ static void RebuildDomainConstraintComment(AlteredTableInfo *tab, int pass,
+ 			   Oid objid, List *domname, char *conname);
  static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
  static void TryReuseForeignKey(Oid oldId, Constraint *con);
  static void change_owner_fix_column_acls(Oid relationOid,
*** AlterTableGetLockLevel(List *cmds)
*** 3319,3324 
--- 3321,3327 
  			case AT_ProcessedConstraint:	/* becomes AT_AddConstraint */
  			case AT_AddConstraintRecurse:	/* becomes AT_AddConstraint */
  			case AT_ReAddConstraint:	/* becomes AT_AddConstraint */
+ 			case AT_ReAddDomainConstraint:	/* becomes AT_AddConstraint */
  if (IsA(cmd->def, Constraint))
  {
  	Constraint *con = (Constraint *) cmd->def;
*** ATRewriteCatalogs(List **wqueue, LOCKMOD
*** 3819,3825 
  			rel = relation_open(tab->relid, NoLock);
  
  			foreach(lcmd, subcmds)
! ATExecCmd(wqueue, tab, rel, (AlterTableCmd *) lfirst(lcmd), lockmode);
  
  			/*
  			 * After the ALTER TYPE pass, do cleanup work (this is not done in
--- 3822,3830 
  			rel = relation_open(tab->relid, NoLock);
  
  			foreach(lcmd, subcmds)
! ATExecCmd(wqueue, tab, rel,
! 		  castNode(AlterTableCmd, lfirst(lcmd)),
! 		  lockmode);
  
  			/*
  			 * After the ALTER TYPE pass, do cleanup work (this is not done in
*** ATExecCmd(List **wqueue, AlteredTableInf
*** 3936,3941 
--- 3941,3953 
  ATExecAddConstraint(wqueue, tab, rel, (Constraint *) cmd->def,
  	true, true, lockmode);
  			break;
+ 		case AT_ReAddDomainConstraint:	/* Re-add pre-existing domain check
+ 		 * constraint */
+ 			address =
+ AlterDomainAddConstraint(((AlterDomainStmt *) cmd->def)->typeName,
+ 		 ((AlterDomainStmt *) cmd->def)->def,
+ 		 NULL);
+ 			break;
  		case AT_ReAddComment:	/* Re-add existing comment */
  			address = CommentObject((CommentStmt *) cmd->def);
  			break;
*** ATPostAlterTypeCleanup(List **wqueue, Al
*** 9616,9622 
  		if (!HeapTupleIsValid(tup)) /* should not happen */
  			elog(ERROR, "cache lookup failed for constraint %u", oldId);
  		con = (Form_pg_constraint) GETSTRUCT(tup);
! 		relid = con->conrelid;
  		confrelid = con->confrelid;
  		conislocal = con->conislocal;
  		ReleaseSysCache(tup);
--- 9628,9642 
  		if (!HeapTupleIsValid(tup)) /* should not happen */
  			elog(ERROR, "cache lookup failed for constraint %u", oldId);
  		con = (Form_pg_constraint) GETSTRUCT(tup);
! 		if (OidIsValid(con->conrelid))
! 			relid = con->conrelid;
! 		else
! 		{
! 			/* must be a domain constraint */
! 			relid = get_typ_typrelid(getBaseType(con->contypid));
! 			if (!OidIsValid(relid))
! elog(ERROR, "could not identify relation associated with constraint %u", oldId);
! 		}
  		confrelid = con->confrelid;
  		conislocal = con->conislocal;
  		ReleaseSysCache(tup);
*** ATPostAlterTypeParse(Oid oldId, Oid oldR
*** 9753,9759 
  
  			foreach(lcmd, stmt->cmds)
  			{
! AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
  
  if (cmd->subtype == AT_AddIndex)
  {
--- 9773,9779 
  
  			foreach(lcmd, stmt->cmds)
  			{
! AlterTableCmd *cmd = castNode(AlterTableCmd, lfirst(lcmd));
  
  if (cmd->subtype == AT_AddIndex)
  {

  1   2   3   4   5   6   7   8   9   10   >