from:"\"Robert Haas\""

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-11-11 Thread Robert Haas

   return typtup;
+}

A function as general as typeGet() certainly does not belong in
parse_clause.c in the middle of a long list of temporal functions.
This particular function is also a bad idea in general, because typtup
is only a valid pointer until ReleaseSysCache() is called.  This will
appear to work normally because, normally, no relevant cache
invalidation messages will show up at the wrong time, but if one does
then typtup will be pointing off into outer space.  You can find bugs
like this by testing with CLOBBER_CACHE_ALWAYS defined (warning: this
makes things very slow).  But the general point I want to make here is
actually about inventing your own idioms vs. using the ones we've got
-- if you're going to invent new general-purpose primitives, they need
to go next to the existing functions that do similar things, not in
whatever part of the code you first decided you needed them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GatherMerge misses to push target list

2017-11-11 Thread Robert Haas

On Mon, Sep 18, 2017 at 7:02 AM, Rushabh Lathia
 wrote:
>> In that case, can you please mark the patch [1] as ready for committer in
>> CF app
>
> Done.

I think this patch is mostly correct, but I think the change to
planner.c isn't quite right.  ordered_rel->reltarget is just a dummy
target list that produces nothing.  Instead, I think we should pass
path->pathtarget, representing the idea that whatever Gather Merge
produces as output is the same as what you put into it.

See attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pushdown-gathermerge-tlist.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-11-10 Thread Robert Haas

On Thu, Nov 9, 2017 at 3:55 AM, amul sul  wrote:
> It took me a little while to understand this calculation.  You have moved this
> code from tbm_create(), but I think you should move the following
> comment as well:

I made an adjustment that I hope will address your concern here, made
a few other adjustments, and committed this.

One point of concern that wasn't entirely addressed in the above
discussion is the accuracy of this formula:

+   lossy_pages = Max(0, heap_pages - maxentries / 2);

When I first looked at Dilip's test results, I thought maybe this
formula was way off.  But on closer study, the formula does a decent
(not fantastic) job of estimating the number of exact pages.  The fact
that the number of lossy pages is off is just because the Mackert and
Lohman formula is overestimating how many pages are fetched.  Now, in
Dilip's results, this formula more often than not - but not invariably
- predicted more exact pages than we actually got.  So possibly
instead of maxentries / 2 we could subtract maxentries or some other
multiple of maxentries.  I don't know what's actually best here, but I
think there's a strong argument that this is an improvement as it
stands, and we can adjust it later if it becomes clear what would be
better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-11-10 Thread Robert Haas

On Fri, Nov 10, 2017 at 2:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I decided to try instead teaching the planner to keep track of the
>> types of PARAM_EXEC parameters as they were created, and that seems to
>> work fine.  See 0001, attached.
>
> I did not look at the other part, but 0001 looks reasonable to me.

Thanks for looking.

> I might quibble with the grammar in the generate_new_param comment:
>
> - * need to record the PARAM_EXEC slot number as being allocated.
> + * need to make sure we have record the type in paramExecTypes (otherwise,
> + * there won't be a slot allocated for it).
>   */
>
> I'd just go with "need to record the type in ..."

Noted.

> Also, I wonder whether the InvalidOid hack in SS_assign_special_param
> requires commentary.  It might be safer to use a valid type OID there,
> perhaps VOIDOID or INTERNALOID.

I think it's pretty straightforward -- if, as the existing comments
say, no Param node will be present and no value will be stored, then
we do not and cannot record the type of the value that we're not
storing.

There is existing precedent for using InvalidOid to denote the absence
of a parameter -- cf. copyParamList, SerializeParamList.  That
convention was not invented by me or the parallel query stuff,
although it has become more widespread for that reason.  I am
disinclined to have this patch invent a New Way To Do It.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2017-11-10 Thread Robert Haas

On Wed, Sep 27, 2017 at 7:07 AM, amul sul  wrote:
> Attaching POC patch that throws an error in the case of a concurrent update
> to an already deleted tuple due to UPDATE of partition key[1].
>
> In a normal update new tuple is linked to the old one via ctid forming
> a chain of tuple versions but UPDATE of partition key[1] move tuple
> from one partition to an another partition table which breaks that chain.

This patch needs a rebase.  It has one whitespace-only hunk that
should possibly be excluded.

I think the basic idea of this is sound.  Either you or Amit need to
document the behavior in the user-facing documentation, and it needs
tests that hit every single one of the new error checks you've added
(obviously, the tests will only work in combination with Amit's
patch).  The isolation should be sufficient to write such tests.

It needs some more extensive comments as well.  The fact that we're
assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal,
and should at least be documented in itemptr.h in the comments for the
ItemPointerData structure.

I suspect ExecOnConflictUpdate needs an update for the
HeapTupleUpdated case similar to what you've done elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-11-10 Thread Robert Haas

On Tue, Nov 7, 2017 at 4:45 AM, Amit Kapila  wrote:
> As mentioned, changed the status of the patch in CF app.

I spent some time reviewing this patch today and found myself still
quite uncomfortable with the fact that it was adding execution-time
work to track the types of parameters - types that would usually not
even be used.  I found the changes to nodeNestLoop.c to be
particularly objectionable, because we could end up doing the work
over and over when it is actually not needed at all, or at most once.
I decided to try instead teaching the planner to keep track of the
types of PARAM_EXEC parameters as they were created, and that seems to
work fine.  See 0001, attached.

0002, attached, is my worked-over version of the rest of the patch.  I
moved the code that serializes and deserializes PARAM_EXEC from
nodeSubplan.c -- which seemed like a strange choice - to
execParallel.c.  I removed the type OID from the serialization format
because there's no reason to do that any more; the worker already
knows the types from the plan.  I did some renaming of the functions
involved and some adjustment of the comments to refer to "PARAM_EXEC
parameters" instead of initPlan parameters, because there's no reason
that I can see why this can only work for initPlans.  A Gather node on
the inner side of a nested loop doesn't sound like a great idea, but I
think this infrastructure could handle it (though it would need some
more planner work).  I broke a lot of long lines in your version of
the patch into multiple lines; please try to be attentive to this
issue when writing patches in general, as it is a bit tedious to go
through and insert line breaks in many places.

Please let me know your thoughts on the attached patches.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-param-exec-types-v1.patch
Description: Binary data


0002-pq-pushdown-initplan-rebased.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect comment for build_child_join_rel

2017-11-10 Thread Robert Haas

On Fri, Nov 10, 2017 at 4:34 AM, Etsuro Fujita
 wrote:
> Here is a small patch for $Subject.

Good catch.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-10 Thread Robert Haas

On Fri, Nov 10, 2017 at 10:34 AM, Tom Lane  wrote:
> I think as far as that goes, we can just change to "Therefore, by default
> their use is restricted ...".  Then I suggest adding a  para
> after that, with wording along the lines of
>
> It is possible to GRANT use of server-side lo_import and lo_export to
> non-superusers, but careful consideration of the security implications
> is required.  A malicious user of such privileges could easily parlay
> them into becoming superuser (for example by rewriting server
> configuration files), or could attack the rest of the server's file
> system without bothering to obtain database superuser privileges as
> such.  Access to roles having such privilege must therefore be guarded
> just as carefully as access to superuser roles.  Nonetheless, if use
> of server-side lo_import or lo_export is needed for some routine task,
> it's safer to use a role of this sort than full superuser privilege,
> as that helps to reduce the risk of damage from accidental errors.

+1.  That seems like great language to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-10 Thread Robert Haas

On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila  wrote:
> I am seeing the assertion failure as below on executing the above
> mentioned Create statement:
>
> TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File:
> "heapam.c", Line: 2634)
> server closed the connection unexpectedly
> This probably means the server terminated abnormally

OK, I see it now.  Not sure why I couldn't reproduce this before.

I think the problem is not actually with the code that I just wrote.
What I'm seeing is that the slot descriptor's tdhasoid value is false
for both the funnel slot and the result slot; therefore, we conclude
that no projection is needed to remove the OIDs.  That seems to make
sense: if the funnel slot doesn't have OIDs and the result slot
doesn't have OIDs either, then we don't need to remove them.
Unfortunately, even though the funnel slot descriptor is marked
tdhashoid = false, the tuples being stored there actually do have
OIDs.  And that is because they are coming from the underlying
sequential scan, which *also* has OIDs despite the fact that tdhasoid
for it's slot is false.

This had me really confused until I realized that there are two
processes involved.  The problem is that we don't pass eflags down to
the child process -- so in the user backend, everybody agrees that
there shouldn't be OIDs anywhere, because EXEC_FLAG_WITHOUT_OIDS is
set.  In the parallel worker, however, it's not set, so the worker
feels free to do whatever comes naturally, and in this test case that
happens to be returning tuples with OIDs.  Patch for this attached.

I also noticed that the code that initializes the funnel slot is using
its own PlanState rather than the outer plan's PlanState to call
ExecContextForcesOids.  I think that's formally incorrect, because the
goal is to end up with a slot that is the same as the outer plan's
slot.  It doesn't matter because ExecContextForcesOids doesn't care
which PlanState it gets passed, but the comments in
ExecContextForcesOids imply that somebody it might, so perhaps it's
best to clean that up.  Patch for this attached, too.

And here are the other patches again, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-pass-eflags-to-worker-v1.patch
Description: Binary data

0002-forces-oids-neatnikism-v1.patch
Description: Binary data

0003-skip-gather-project-v2.patch
Description: Binary data

0004-shm-mq-less-spinlocks-v2.patch
Description: Binary data

0005-shm-mq-reduce-receiver-latch-set-v1.patch
Description: Binary data

0006-remove-memory-leak-protection-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila  wrote:
> Have you set force_parallel_mode=regress; before running the
> statement?

Yes, I tried that first.

> If so, then why you need to tune other parallel query
> related parameters?

Because I couldn't get it to break the other way, I then tried this.

Instead of asking me what I did, can you tell me what I need to do?
Maybe a self-contained reproducible test case including exactly what
goes wrong on your end?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-11-09 Thread Robert Haas

On Wed, Nov 1, 2017 at 6:16 AM, amul sul  wrote:
> Fixed in the 0003 patch.

I have committed this patch set with the attached adjustments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


hash-adjustments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 2:56 PM, Stephen Frost  wrote:
> I agree that it may not be obvious which cases lead to a relatively easy
> way to obtain superuser and which don't, and that's actually why I'd
> much rather it be something that we're considering and looking out for
> because, frankly, we're in a much better position generally to realize
> those cases than our users are.

I disagree.  It's flattering to imagine that PostgreSQL developers, as
a class, are smarter than PostgreSQL users, but it doesn't match my
observations.

> Further, I agree entirely that we
> shouldn't be deciding that certain capabilities are never allowed to be
> given to a user- but that's why superuser *exists* and why it's possible
> to give superuser access to multiple users.  The question here is if
> it's actually sensible for us to make certain actions be grantable to
> non-superusers which allow that role to become, or to essentially be, a
> superuser.  What that does, just as it does in the table case, from my
> viewpoint at least, is make it *look* to admins like they're grant'ing
> something less than superuser when, really, they're actually giving the
> user superuser-level access.  That violates the POLA because the admin
> didn't write 'ALTER USER joe WITH SUPERUSER', they just wrote 'GRANT
> EXECUTE ON lo_import() TO joe;'.

This is exactly the kind of thing that I'm talking about.  Forcing an
administrator to hand out full superuser access instead of being able
to give just lo_import() makes life difficult for users for no good
reason.  The existence of a theoretically-exploitable security
vulnerability is not tantamount to really having access, especially on
a system with a secured logging facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 3:47 AM, Amit Kapila  wrote:
> I think I understood your concern after some offlist discussion and it
> is primarily due to the inheritance related check which can skip the
> generation of gather paths when it shouldn't.  So what might fit
> better here is a straight check on the number of base rels such that
> allow generating gather path in set_rel_pathlist, if there are
> multiple baserels involved.  I have used all_baserels which I think
> will work better for this purpose.

Yes, that looks a lot more likely to be correct.

Let's see what Tom thinks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 1:52 PM, Stephen Frost  wrote:
>> I disagree that that is, or should be, a guiding principle.
>
> It was what I was using as the basis of the work which I did in this
> area and, at least at that time, there seemed to be little issue with
> that.

Well, there actually kind of was.  It came up here:

http://postgr.es/m/ca+tgmoy6ne5n4jc5awxser2g2gdgr4omf7edcexamvpf_jq...@mail.gmail.com

I mis-remembered who was on which side of the debate, though, hence
the comment about employers.  But never mind about that, since I was
wrong.  Sorry for not checking that more carefully before.

> This is not unlike the discussions we've had in the past around allowing
> non-owners of a table to modify properties of a table, where the
> argument has been successfully and clearly made that if you make the
> ability to change the table a GRANT'able right, then that can be used to
> become the role which owns the table without much difficulty, and so
> there isn't really a point in making that right independently
> GRANT'able.  We have some of that explicitly GRANT'able today due to
> requirements of the spec, but that's outside of our control.

I don't think it's quite the same thing.  I wouldn't go out of my way
to invent grantable table rights that just let you usurp the table
owner's permissions, but I still think it's better to have a uniform
rule that functions we ship don't contain hard-coded superuser checks.
One problem is that which functions allow an escalation to superuser
that is easy enough or reliable enough may not actually be a very
bright line in all cases.  But more than that, I think it's a
legitimate decision to grant to a user a right that COULD lead to a
superuser escalation and trust them not to use that way, or prevent
them from using it that way by some means not known to the database
system (e.g. route all queries through pgbouncer and send them to a
teletype; if a breach is detected, go to the teletype room, read the
paper contained therein, and decide who to fire/imprison/execute at
gunpoint).  It shouldn't be our job to decide that granting a certain
right is NEVER ok.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 1:16 PM, Stephen Frost  wrote:
> While we have been working to reduce the number of superuser() checks in
> the backend in favor of having the ability to GRANT explicit rights, one
> of the guideing principles has always been that capabilities which can
> be used to gain superuser rights with little effort should remain
> restricted to the superuser, which is why the lo_import/lo_export hadn't
> been under consideration for superuser-check removal in the analysis I
> provided previously.

I disagree that that is, or should be, a guiding principle.  I'm not
sure that anyone other than you and your fellow employees at Crunchy
has ever agreed that this is some kind of principle.  You make it
sound like there's a consensus about this, but I think there isn't.

I think our guiding principle should be to get rid of ALL of the
hard-coded superuser checks and let people GRANT what they want.  If
they grant a permission that results in somebody escalating to
superuser, they get to keep both pieces.  Such risks might be worth
documenting, but we shouldn't obstruct people from doing it.

In the same way, Linux will not prevent you from making a binary
setuid regardless of what the binary does.  If you make a binary
setuid root that lets someone hack root, that's your fault, not the
operating system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila  wrote:
> This change looks suspicious to me.  I think here we can't use the
> tupDesc constructed from targetlist.  One problem, I could see is that
> the check for hasOid setting in tlist_matches_tupdesc won't give the
> correct answer.   In case of the scan, we use the tuple descriptor
> stored in relation descriptor which will allow us to take the right
> decision in tlist_matches_tupdesc.  If you try the statement CREATE
> TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in
> force_parallel_mode=regress, then you can reproduce the problem I am
> trying to highlight.

I tried this, but nothing seemed to be obviously broken.  Then I
realized that the CREATE TABLE command wasn't using parallelism, so I
retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and
min_parallel_table_scan_size = 0.  That got it to use parallel query,
but I still don't see anything broken.  Can you clarify further?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-11-09 Thread Robert Haas

On Sat, Oct 28, 2017 at 5:50 AM, Robert Haas  wrote:
> No, because the Append node is *NOT* getting copied into shared
> memory.  I have pushed a comment update to the existing functions; you
> can use the same comment for this patch.

I spent the last several days working on this patch, which had a
number of problems both cosmetic and functional.  I think the attached
is in better shape now, but it could certainly use some more review
and testing since I only just finished modifying it, and I modified it
pretty heavily.  Changes:

- I fixed the "morerows" entries in the documentation.  If you had
built the documentation the way you had it and loaded up in a web
browser, you would have seen that the way you had it was not correct.

- I moved T_AppendState to a different position in the switch inside
ExecParallelReInitializeDSM, so as to keep that switch in the same
order as all of the other switch statements in that file.

- I rewrote the comment for pa_finished.  It previously began with
"workers currently executing the subplan", which is not an accurate
description. I suspect this was a holdover from a previous version of
the patch in which this was an array of integers rather than an array
of type bool.  I also fixed the comment in ExecAppendEstimate and
added, removed, or rewrote various other comments as well.

- I renamed PA_INVALID_PLAN to INVALID_SUBPLAN_INDEX, which I think is
more clear and allows for the possibility that this sentinel value
might someday be used for non-parallel-aware Append plans.

- I largely rewrote the code for picking the next subplan.  A
superficial problem with the way that you had it is that you had
renamed exec_append_initialize_next to exec_append_seq_next but not
updated the header comment to match.  Also, the logic was spread out
all over the file.  There are three cases: not parallel aware, leader,
worker.  You had the code for the first case at the top of the file
and the other two cases at the bottom of the file and used multiple
"if" statements to pick the right one in each case.  I replaced all
that with a function pointer stored in the AppendState, moved the code
so it's all together, and rewrote it in a way that I find easier to
understand.  I also changed the naming convention.

- I renamed pappend_len to pstate_len and ParallelAppendDescData to
ParallelAppendState.  I think the use of the word "descriptor" is a
carryover from the concept of a scan descriptor.  There's nothing
really wrong with inventing the concept of an "append descriptor", but
it seems more clear to just refer to shared state.

- I fixed ExecAppendReInitializeDSM not to reset node->as_whichplan.
Per commit 41b0dd987d44089dc48e9c70024277e253b396b7, that's wrong;
instead, local state should be reset in ExecReScanAppend.  I installed
what I believe to be the correct logic in that function instead.

- I fixed list_qsort() so that it copies the type of the old list into
the new list.  Otherwise, sorting a list of type T_IntList or
T_OidList would turn it into just plain T_List, which is wrong.

- I removed get_append_num_workers and integrated the logic into the
callers.  This function was coded quite strangely: it assigned the
return value of fls() to a double and then eventually rounded the
result back to an integer.  But fls() returns an integer, so this
doesn't make much sense.  On a related note, I made it use fls(# of
subpaths) instead of fls(# of subpaths)+1.  Adding 1 doesn't make
sense to me here because it leads to a decision to use 2 workers for a
single, non-partial subpath.  I suspect both of these mistakes stem
from thinking that fls() returns the base-2 logarithm, but in fact it
doesn't, quite: log2(1) = 0.0 but fls(1) = 1.

- In the process of making the changes described in the previous
point, I added a couple of assertions, one of which promptly failed.
It turns out the reason is that your patch didn't update
accumulate_append_subpaths(), which can result in flattening
non-partial paths from a Parallel Append into a parent Append's list
of partial paths, which is bad.  The easiest way to fix that would be
to just teach accumulate_append_subpaths() not to flatten a Parallel
Append into a parent Append or MergeAppend node, but it seemed to me
that there was a fair amount of duplication between
accumulate_partialappend_subpath() and accumulate_append_subpaths, so
what I did instead is folded all of the necessarily logic directly
into accumulate_append_subpaths().  This approach also avoids some
assumptions that your code made, such as the assumption that we will
never have a partial MergeAppend path.

- I changed create_append_path() so that it uses the parallel_aware
argument as the only determinant of whether the output path is flagged
as parallel-aware. Your version also considered whether
parallel_workers > 0, but I think that's not a good idea; the caller

Re: [HACKERS] pageinspect option to forgo buffer locking?

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 12:58 PM, Andres Freund  wrote:
> You can already pass arbitrary byteas to heap_page_items(), so I don't
> see how this'd change anything exposure-wise? Or are you thinking that
> users would continually do this with actual page contents and would be
> more likely to hit edge cases than if using pg_read_binary_file() or
> such (which obviously sees an out of date page)?

More the latter.  It's not really an increase in terms of security
exposure, but it might lead to more trouble in practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect option to forgo buffer locking?

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 12:49 PM, Andres Freund  wrote:
> Occasionally, when debugging issues, I find it quite useful to be able
> to do a heap_page_items() on a page that's actually locked exclusively
> concurrently. E.g. investigating the recent multixact vacuuming issues,
> it was very useful to attach a debugger to one backend to step through
> freezing, and display the page in another session.
>
> Currently the locking in get_raw_page_internal() prevents that.  If it's
> an option defaulting to off, I don't see why we couldn't allow that to
> skip locking the page's contents. Obviously you can get corrupted
> contents that way, but we already allow to pass arbitrary stuff to
> heap_page_items().  Since pinning wouldn't be changed, there's no danger
> of the page being moved out from under us.

heap_page_items() is, if I remember correctly, not necessarily going
to tolerate malformed input very well - I think that's why we restrict
all of these functions to superusers.  So using it in this way would
seem to increase the risk of a server crash or other horrible
misbehavior.  Of course if we're just dev-testing that doesn't really
matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Runtime Partition Pruning

2017-11-09 Thread Robert Haas

On Thu, Nov 9, 2017 at 6:18 AM, Beena Emerson  wrote:
> The code still chooses the custom plan instead of the generic plan for
> the prepared statements. I am working on it.

I don't think it's really the job of this patch to do anything about
that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-09 Thread Robert Haas

On Wed, Nov 8, 2017 at 9:40 AM, Masahiko Sawada  wrote:
> Speaking of the acquiring these four lock types and heavy weight lock,
> there obviously is a call path to acquire any of four lock types while
> holding a heavy weight lock. In reverse, there also is a call path
> that we acquire a heavy weight lock while holding any of four lock
> types. The call path I found is that in heap_delete we acquire a tuple
> lock and call XactLockTableWait or MultiXactIdWait which eventually
> could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent
> transactions finish. But IIUC since these functions acquire the lock
> for the concurrent transaction's transaction id, deadlocks doesn't
> happen.

No, that's not right.  Now that you mention it, I realize that tuple
locks can definitely cause deadlocks.  Example:

setup:
rhaas=# create table foo (a int, b text);
CREATE TABLE
rhaas=# create table bar (a int, b text);
CREATE TABLE
rhaas=# insert into foo values (1, 'hoge');
INSERT 0 1

session 1:
rhaas=# begin;
BEGIN
rhaas=# update foo set b = 'hogehoge' where a = 1;
UPDATE 1

session 2:
rhaas=# begin;
BEGIN
rhaas=# update foo set b = 'quux' where a = 1;

session 3:
rhaas=# begin;
BEGIN
rhaas=# lock bar;
LOCK TABLE
rhaas=# update foo set b = 'blarfle' where a = 1;

back to session 1:
rhaas=# select * from bar;
ERROR:  deadlock detected
LINE 1: select * from bar;
  ^
DETAIL:  Process 88868 waits for AccessShareLock on relation 16391 of
database 16384; blocked by process 88845.
Process 88845 waits for ExclusiveLock on tuple (0,1) of relation 16385
of database 16384; blocked by process 88840.
Process 88840 waits for ShareLock on transaction 1193; blocked by process 88868.
HINT:  See server log for query details.

So what I said before was wrong: we definitely cannot exclude tuple
locks from deadlock detection.  However, we might be able to handle
the problem in another way: introduce a separate, parallel-query
specific mechanism to avoid having two participants try to update
and/or delete the same tuple at the same time - e.g. advertise the
BufferTag + offset within the page in DSM, and if somebody else
already has that same combination advertised, wait until they no
longer do.  That shouldn't ever deadlock, because the other worker
shouldn't be able to find itself waiting for us while it's busy
updating a tuple.

After some further study, speculative insertion locks look problematic
too.  I'm worried about the code path ExecInsert() [taking speculative
insertion locking] -> heap_insert -> heap_prepare_insert ->
toast_insert_or_update -> toast_save_datum ->
heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock).  That sure
looks like we can end up waiting for a relation lock while holding a
speculative insertion lock, which seems to mean that speculative
insertion locks are subject to at least theoretical deadlock hazards
as well.  Note that even if we were guaranteed to be holding the lock
on the toast relation already at this point, it wouldn't fix the
problem, because we might still have to build or refresh a relcache
entry at this point, which could end up scanning (and thus locking)
system catalogs.  Any syscache lookup can theoretically take a lock,
even though most of the time it doesn't, and thus taking a lock that
has been removed from the deadlock detector (or, say, an lwlock) and
then performing a syscache lookup with it held is not OK.  So I don't
think we can remove speculative insertion locks from the deadlock
detector either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-08 Thread Robert Haas

On Wed, Nov 8, 2017 at 10:33 AM, Tom Lane  wrote:
> I do not think there is any change here that can be proven to always be a
> win.  Certainly the original patch, which proposes to replace an O(n log n)
> sort algorithm with an O(n^2) one, should not be thought to be that.
> The question to focus on is what's the average case, and I'm not sure how
> to decide what the average case is.  But more than two test scenarios
> would be a good start.

I appreciate the difficulties here; I'm just urging caution.  Let's
not change things just to clear this patch off our plate.

Just to throw a random idea out here, we currently have
gen_qsort_tuple.pl producing qsort_tuple() and qsort_ssup().  Maybe it
could be modified to also produce a specialized qsort_itemids().  That
might be noticeably faster that our general-purpose qsort() for the
reasons mentioned in the comments in gen_qsort_tuple.pl, viz:

# The major effects are (1) inlining simple tuple comparators is much faster
# than jumping through a function pointer and (2) swap and vecswap operations
# specialized to the particular data type of interest (in this case, SortTuple)
# are faster than the generic routines.

I don't remember any more just how much faster qsort_tuple() and
qsort_ssup() are than plain qsort(), but it was significant enough to
convince me to commit 337b6f5ecf05b21b5e997986884d097d60e4e3d0...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-08 Thread Robert Haas

On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila  wrote:
> We do want to generate it later when there isn't inheritance involved,
> but only if there is a single rel involved (simple_rel_array_size
> <=2).  The rule is something like this, we will generate the gather
> paths at this stage only if there are more than two rels involved and
> there isn't inheritance involved.

Why is that the correct rule?

Sorry if I'm being dense here.  I would have thought we'd want to skip
it for the topmost scan/join rel regardless of the presence or absence
of inheritance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small improvement to compactify_tuples

2017-11-08 Thread Robert Haas

On Tue, Nov 7, 2017 at 4:39 PM, Tom Lane  wrote:
> What I'm getting from the standard pgbench measurements, on both machines,
> is that this patch might be a couple percent slower than HEAD, but that is
> barely above the noise floor so I'm not too sure about it.

Hmm.  It seems like slowing down single client performance by a couple
of percent is something that we really don't want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-08 Thread Robert Haas

On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila  wrote:
> This is required to prohibit generating gather path for top rel in
> case of inheritence (Append node) at this place (we want to generate
> it later when scan/join target is available).

OK, but why don't we want to generate it later when there isn't
inheritance involved?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-07 Thread Robert Haas

On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi  wrote:
> The newly added option is not recommended to be used in normal cases and
> it is used only for upgrade utilities.

I don't know why it couldn't be used in normal cases.  That seems like
a totally legitimate thing for somebody to want.  Maybe nobody does,
but I see no reason to worry if they do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_wal_write statistics view

2017-11-07 Thread Robert Haas

On Tue, Nov 7, 2017 at 4:31 AM, Haribabu Kommi  wrote:
>> Updated patch attached.
> Patch rebased.

I think the earlier concerns about the performance impact of this are
probably very valid concerns, and I don't see how the new version of
the patch gets us much closer to solving them.

I am also not sure I understand how the backend_write_blocks column is
intended to work.  The only call to pgstat_send_walwrites() is in
WalWriterMain, so where do the other backends report anything?

Also, if there's only ever one global set of counters (as opposed to
one per table, say) then why use the stats collector machinery for
this at all, vs. having a structure in shared memory that can be
updated directly?  It seems like adding a lot of overhead for no
functional benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix a typo in dsm_impl.c

2017-11-07 Thread Robert Haas

On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada  wrote:
> Attached the patch for $subject.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Robert Haas

On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund  wrote:
> +   ResourceOwnerEnlargeFiles(CurrentResourceOwner);
> +   ResourceOwnerRememberFile(CurrentResourceOwner, file);
> +   VfdCache[file].resowner = CurrentResourceOwner;
>
> So maybe I'm being pedantic here, but wouldn't the right order be to do
> ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory
> allocating operation, so it can fail, which'd leak the file.

That's not pedantic ... that's a very sound criticism.  IMHO, anyway.

> diff --git a/src/backend/utils/resowner/resowner.c 
> b/src/backend/utils/resowner/resowner.c
> index 4c35ccf65eb..8b91d5a6ebe 100644
> --- a/src/backend/utils/resowner/resowner.c
> +++ b/src/backend/utils/resowner/resowner.c
> @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
> PrintRelCacheLeakWarning(res);
> RelationClose(res);
> }
> -
> -   /* Ditto for dynamic shared memory segments */
> -   while (ResourceArrayGetAny(&(owner->dsmarr), &foundres))
> -   {
> -   dsm_segment *res = (dsm_segment *) 
> DatumGetPointer(foundres);
> -
> -   if (isCommit)
> -   PrintDSMLeakWarning(res);
> -   dsm_detach(res);
> -   }
> }
> else if (phase == RESOURCE_RELEASE_LOCKS)
> {
> @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
> PrintFileLeakWarning(res);
> FileClose(res);
> }
> +
> +   /* Ditto for dynamic shared memory segments */
> +   while (ResourceArrayGetAny(&(owner->dsmarr), &foundres))
> +   {
> +   dsm_segment *res = (dsm_segment *) 
> DatumGetPointer(foundres);
> +
> +   if (isCommit)
> +   PrintDSMLeakWarning(res);
> +   dsm_detach(res);
> +   }
> }
>
> Is that entirely unproblematic? Are there any DSM callbacks that rely on
> locks still being held? Please split this part into a separate commit
> with such analysis.

FWIW, I think this change is a really good idea (I recommended it to
Thomas at some stage, I think).  The current positioning was decided
by me at a very early stage of parallel query development where I
reasoned as follows (1) the first thing we're going to implement is
going to be parallel quicksort, (2) that's going to allocate a huge
amount of DSM, (3) therefore we should try to free it as early as
possible.  However, I now thing that was wrongheaded, and not just
because parallel quicksort didn't turn out to be the first thing we
developed.  Memory is the very last resource we should release when
aborting a transaction, because any other resource we have is tracked
using data structures that are stored in memory.  Throwing the memory
away before the end therefore makes life very difficult. That's why,
for backend-private memory, we clean up most everything else in
AbortTransaction() and then only destroy memory contexts in
CleanupTransaction().  This change doesn't go as far, but it's in the
same general direction, and I think rightly so.  My error was in
thinking that the primary use of memory would be for storing data, but
really it's about where you put your control structures.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-07 Thread Robert Haas

On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila  wrote:
>> Well, I suppose that test will fire for a baserel when the total
>> number of baserels is at least 3 and there's no inheritance involved.
>> But if there are 2 baserels, we're still not the topmost scan/join
>> target.
>
> No, if there are 2 baserels, then simple_rel_array_size will be 3.
> The simple_rel_array_size is always the "number of relations" plus
> "one".  See setup_simple_rel_arrays.

Oh, wow.  That's surprising.

>>  Also, even if inheritance is used, we might still be the
>> topmost scan/join target.
>
> Sure, but in that case, it won't generate the gather path here (due to
> this part of check "!root->append_rel_list").  I am not sure whether I
> have understood the second part of your question, so if my answer
> appears inadequate, then can you provide more details on what you are
> concerned about?

I don't know why the question of why root->append_rel_list is empty is
the relevant thing here for deciding whether to generate gather paths
at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-07 Thread Robert Haas

On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapat
 wrote:
> IIRC, only thing that changes between plan time quals and execution
> time quals is constaint folding of constant parameters. But I don't
> think we change the selectivity estimates when that's done. At the
> same time, I don't think we should make a lot of effort to make sure
> that the order used during the estimation is same as the order at the
> execution; we are anyway estimating. There can always be some
> difference between what's estimated and what actually happens.

I don't think that's a good justification for allowing the order to
vary.  It's true that, at execution time, we may find row counts and
selectivities to be different than what we predicted, but that is a
case of the real data being different from our idealized data -- which
is difficult to avoid in general.  Allowing our actual decisions to be
different from the decisions we thought we would make seems like
programmer sloppiness.  It would also be very confusing if the planner
uses one ordering to estimate the cost and then a different order at
execution time and in the EXPLAIN ANALYZE output.  How could anybody
understand what had happened in such a case?

I think it would be a good idea, as Thomas says, to order the qual
clauses at an earlier stage and then remember our decision.  However,
we have to think about whether that's going to increase planning time
in a noticeable way.  I wonder why we currently postpone this until
such a late phase of planning.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-11-07 Thread Robert Haas

On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada  wrote:
>>> I suggest that a good thing to do more or less immediately, regardless
>>> of when this patch ends up being ready, would be to insert an
>>> insertion that LockAcquire() is never called while holding a lock of
>>> one of these types.  If that assertion ever fails, then the whole
>>> theory that these lock types don't need deadlock detection is wrong,
>>> and we'd like to find out about that sooner or later.
>>
>> I understood. I'll check that first.
>
> I've checked whether LockAcquire is called while holding a lock of one
> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
> we cannot move these four lock types together out of heavy-weight
> lock, but can move only the relation extension lock with tricks.
>
> Here is detail of the survey.

Thanks for these details, but I'm not sure I fully understand.

> * LOCKTAG_RELATION_EXTENSION
> There is a path that LockRelationForExtension() could be called while
> holding another relation extension lock. In brin_getinsertbuffer(), we
> acquire a relation extension lock for a index relation and could
> initialize a new buffer (brin_initailize_empty_new_buffer()). During
> initializing a new buffer, we call RecordPageWithFreeSpace() which
> eventually can call fsm_readbuf(rel, addr, true) where the third
> argument is "extend". We can process this problem by having the list
> (or local hash) of acquired locks and skip acquiring the lock if
> already had. For other call paths calling LockRelationForExtension, I
> don't see any problem.

Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock?

Basically, what matters here in the end is whether we can articulate a
deadlock-proof rule around the order in which these locks are
acquired.  The simplest such rule would be "you can only acquire one
lock of any of these types at a time, and you can't subsequently
acquire a heavyweight lock".  But a more complicated rule would be OK
too, e.g. "you can acquire as many heavyweight locks as you want, and
after that you can optionally acquire one page, tuple, or speculative
token lock, and after that you can acquire a relation extension lock".
The latter rule, although more complex, is still deadlock-proof,
because the heavyweight locks still use the deadlock detector, and the
rest has a consistent order of lock acquisition that precludes one
backend taking A then B while another backend takes B then A.  I'm not
entirely clear whether your survey leads us to a place where we can
articulate such a deadlock-proof rule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-11-06 Thread Robert Haas

On Wed, Oct 25, 2017 at 11:40 AM, Amit Khandekar  wrote:
> Below I have addressed the remaining review comments :

The changes to trigger.c still make me super-nervous.  Hey THOMAS
MUNRO, any chance you could review that part?

+   /* The caller must have already locked all the partitioned tables. */
+   root_rel = heap_open(root_relid, NoLock);
+   *all_part_cols = NULL;
+   foreach(lc, partitioned_rels)
+   {
+   Index   rti = lfirst_int(lc);
+   Oid relid = getrelid(rti, rtables);
+   Relationpart_rel = heap_open(relid, NoLock);
+
+   pull_child_partition_columns(part_rel, root_rel, all_part_cols);
+   heap_close(part_rel, NoLock);

I don't like the fact that we're opening and closing the relation here
just to get information on the partitioning columns.  I think it would
be better to do this someplace that already has the relation open and
store the details in the RelOptInfo.  set_relation_partition_info()
looks like the right spot.

+void
+pull_child_partition_columns(Relation rel,
+Relation parent,
+Bitmapset **partcols)

This code has a lot in common with is_partition_attr().  I'm not sure
it's worth trying to unify them, but it could be done.

+ * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT,

Instead of " : ", you could just write "is the".

+* For Updates, if the leaf partition is already present in the
+* per-subplan result rels, we re-use that rather than
initialize a
+* new result rel. The per-subplan resultrels and the
resultrels of
+* the leaf partitions are both in the same canonical
order. So while

It would be good to explain the reason.  Also, Updates shouldn't be
capitalized here.

+   Assert(cur_update_rri <= update_rri +
num_update_rri - 1);

Maybe just cur_update_rri < update_rri + num_update_rri, or even
current_update_rri - update_rri < num_update_rri.

Also, +1 for Amit Langote's idea of trying to merge
mt_perleaf_childparent_maps with mt_persubplan_childparent_maps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Client Connection redirection support for PostgreSQL

2017-11-06 Thread Robert Haas

On Thu, Nov 2, 2017 at 4:33 PM, Craig Ringer  wrote:
>> Add the ability to the PostgreSQL server instance to route the traffic to a
>> different server instance based on the rules defined in server’s pg_bha.conf
>> configuration file. At a high level this enables offloading the user
>> requests to a different server instance based on the rules defined in the
>> pg_hba.conf configuration file.
>
> pg_hba.conf is "host based access [control]" . I'm not sure it's
> really the right place.

Well, we could invent someplace else, but I'm not sure I quite see the
point (full disclosure: I suggested the idea of doing this via
pg_hba.conf in an off-list discussion).

I do think the functionality is useful, for the same reasons that HTTP
redirects are useful.  For example, let's say you have all of your
databases for various clients on a single instance.  Then, one client
starts using a lot more resources, so you want to move that client to
a separate instance on another VM.  You can set up logical replication
to replicate all of the data to the new instance, and then add a
pg_hba.conf entry to redirect connections to that database to the new
master (this would be even smoother if we had multi-master replication
in core).  So now that client is moved off to another machine in a
completely client-transparent way.  I think that's pretty cool.

> When this has come up before, one of the issues has been determining
> what exactly should constitute "read only" vs "read write" for the
> purposes of redirecting work.

Yes, that needs some thought.

> Backends used just for a redirect would be pretty expensive though.

Not as expensive as proxying the whole connection, as pgpool and other
systems do today.  I think the in-core use of this redirect
functionality is useful, but I think the real win would be optionally
using it in pgpool and pgbouncer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-06 Thread Robert Haas

On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila  wrote:
> On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas  wrote:
>> This looks like it's on the right track to me.  I hope Tom will look
>> into it, but if he doesn't I may try to get it committed myself.
>>
>> -if (rel->reloptkind == RELOPT_BASEREL)
>> -generate_gather_paths(root, rel);
>> +if (rel->reloptkind == RELOPT_BASEREL &&
>> +root->simple_rel_array_size > 2 &&
>> +!root->append_rel_list)
>>
>> This test doesn't look correct to me.  Actually, it doesn't look
>> anywhere close to correct to me.  So, one of us is very confused...
>> not sure whether it's you or me.
>>
> It is quite possible that I haven't got it right, but it shouldn't be
> completely bogus as it stands the regression tests and some manual
> verification.  Can you explain what is your concern about this test?

Well, I suppose that test will fire for a baserel when the total
number of baserels is at least 3 and there's no inheritance involved.
But if there are 2 baserels, we're still not the topmost scan/join
target.  Also, even if inheritance is used, we might still be the
topmost scan/join target.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Early locking option to parallel backup

2017-11-06 Thread Robert Haas

On Mon, Nov 6, 2017 at 4:43 AM, Tom Lane  wrote:
> I wonder if we couldn't somehow repurpose the work that was done for
> parallel workers' locks.  Lots of security-type issues to be handled
> if we're to open that up to clients, but maybe it's solvable.  For
> instance, maybe only allowing it to clients sharing the same snapshot
> would help.

Interesting idea.  There's a bunch of holes that would need to be
patched there; for instance, you can't have one session running DDL
while somebody else has AccessShareLock.  Parallel query relies on the
parallel-mode restrictions to prevent that kind of thing from
happening, but it would be strange (and likely somewhat broken) to try
to enforce those here.  It would be strange and probably bad if LOCK
TABLE a; LOCK TABLE b in one session and LOCK TABLE b; LOCK TABLE a in
another session failed to deadlock.  In short, there's a big
difference between a single session using multiple processes and
multiple closely coordinated sessions.

Also, even if you did it, you still need a lot of PROCLOCKs.  Workers
don't need to take all locks up front because they can be assured of
getting them later, but they've still got to lock the objects they
actually want to access.  Group locking aims to prevent deadlocks
between cooperating processes; it is not a license to skip locking
altogether.

None of which is to say that the problems don't feel related somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Early locking option to parallel backup

2017-11-05 Thread Robert Haas

On Sun, Nov 5, 2017 at 5:17 AM, Lucas  wrote:
> The patch creates a "--lock-early" option which will make pg_dump to issue
> shared locks on all tables on the backup TOC on each parallel worker start.
> That way, the backup has a very small chance of failing. When it does,
> happen in the first few seconds of the backup job. My backup scripts (not
> included here) are aware of that and retries the backup in case of failure.

I wonder why we don't do this already ... and by default.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Custom compression methods

2017-11-05 Thread Robert Haas

On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov  wrote:
>> IIRC there were some concerns about what happened with pg_upgrade,
>> with consuming precious toast bits, and a few other things.
>
> yes, pg_upgrade may be a problem.

A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
break your database irretrievably.  If there's no data compressed
using the compression method you dropped, everything is cool -
otherwise everything is broken and there's no way to recover.  The
only obvious alternative is to disallow DROP altogether (or make it
not really DROP).

Both of those alternatives sound fairly unpleasant to me, but I'm not
exactly sure what to recommend in terms of how to make it better.
Ideally anything we expose as an SQL command should have a DROP
command that undoes whatever CREATE did and leaves the database in an
intact state, but that seems hard to achieve in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why not parallel seq scan for slow functions

2017-11-05 Thread Robert Haas

On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila  wrote:
> Thanks for the confirmation.  Find rebased patch attached.

This looks like it's on the right track to me.  I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.

-if (rel->reloptkind == RELOPT_BASEREL)
-generate_gather_paths(root, rel);
+if (rel->reloptkind == RELOPT_BASEREL &&
+root->simple_rel_array_size > 2 &&
+!root->append_rel_list)

This test doesn't look correct to me.  Actually, it doesn't look
anywhere close to correct to me.  So, one of us is very confused...
not sure whether it's you or me.

 simple_gather_path = (Path *)
 create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
NULL, NULL);
+
+/* Add projection step if needed */
+if (target && simple_gather_path->pathtarget != target)
+simple_gather_path = apply_projection_to_path(root, rel,
simple_gather_path, target);

Instead of using apply_projection_to_path, why not pass the correct
reltarget to create_gather_path?

+/* Set or update cheapest_total_path and related fields */
+set_cheapest(current_rel);

I wonder if it's really OK to call set_cheapest() a second time for
the same relation...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-05 Thread Robert Haas

On Fri, Nov 3, 2017 at 5:57 PM, Alexander Korotkov
 wrote:
> On Thu, Nov 2, 2017 at 5:56 AM, Robert Haas  wrote:
>> > I can propose following alternative approach: teach read-only queries on
>> > hot
>> > standby to tolerate concurrent relation truncation.  Therefore, when
>> > non-existent heap page is accessed on hot standby, we can know that it
>> > was
>> > deleted by concurrent truncation and should be assumed to be empty.  Any
>> > thoughts?
>>
>> Sounds like it might break MVCC?
>
> I don't know why it might be broken.  VACUUM truncates heap only when tail
> to be truncated is already empty.  When applying truncate WAL record,
> previous WAL records deleting all those tuples in the tail are already
> applied.  Thus, if even MVCC is broken and we will miss some tuples after
> heap truncation, they were anyway were gone before heap truncation.

Ah - I was thinking of the TRUNCATE command, rather than truncation by
VACUUM.  Your argument makes sense, although the case where the
relation is truncated and later re-extended might need some thought.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-05 Thread Robert Haas

On Sun, Nov 5, 2017 at 2:24 AM, Andres Freund  wrote:
>> shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only
>> consume input from the shared queue when the amount of unconsumed
>> input exceeds 1/4 of the queue size.  This caused a large performance
>> improvement in my testing because it causes the number of times the
>> latch gets set to drop dramatically. I experimented a bit with
>> thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be
>> enough to capture most of the benefit.
>
> Hm. Is consuming the relevant part, or notifying the sender about it?  I
> suspect most of the benefit can be captured by updating bytes read (and
> similarly on the other side w/ bytes written), but not setting the latch
> unless thresholds are reached.  The advantage of updating the value,
> even without notifying the other side, is that in the common case that
> the other side gets around to checking the queue without having blocked,
> it'll see the updated value.  If that works, that'd address the issue
> that we might wait unnecessarily in a number of common cases.

I think it's mostly notifying the sender.  Sending SIGUSR1 over and
over again isn't free, and it shows up in profiling.  I thought about
what you're proposing here, but it seemed more complicated to
implement, and I'm not sure that there would be any benefit.  The
reason is because, with these patches applied, even a radical
expansion of the queue size doesn't produce much incremental
performance benefit at least in the test case I was using.  I can
increase the size of the tuple queues 10x or 100x and it really
doesn't help very much.  And consuming sooner (but sometimes without
notifying) seems very similar to making the queue slightly bigger.

Also, what I see in general is that the CPU usage on the leader goes
to 100% but the workers are only maybe 20% saturated.  Making the
leader work any harder than absolutely necessarily therefore seems
like it's probably counterproductive.  I may be wrong, but it looks to
me like most of the remaining overhead seems to come from (1) the
synchronization overhead associated with memory barriers and (2)
backend-private work that isn't as cheap as would be ideal - e.g.
palloc overhead.

> Interesting.  Here it's
> +8.79%  postgres  postgres[.] ExecAgg
> +6.52%  postgres  postgres[.] slot_deform_tuple
> +5.65%  postgres  postgres[.] slot_getattr
> +4.59%  postgres  postgres[.] shm_mq_send_bytes
> +3.66%  postgres  postgres[.] ExecInterpExpr
> +3.44%  postgres  postgres[.] AllocSetAlloc
> +3.08%  postgres  postgres[.] heap_fill_tuple
> +2.34%  postgres  postgres[.] heap_getnext
> +2.25%  postgres  postgres[.] finalize_aggregates
> +2.08%  postgres  libc-2.24.so[.] __memmove_avx_unaligned_erms
> +2.05%  postgres  postgres[.] heap_compare_slots
> +1.99%  postgres  postgres[.] execTuplesMatch
> +1.83%  postgres  postgres[.] ExecStoreTuple
> +1.83%  postgres  postgres[.] shm_mq_receive
> +1.81%  postgres  postgres[.] ExecScan

More or less the same functions, somewhat different order.

>> I'm probably not super-excited about spending too much more time
>> trying to make the _platform_memmove time (only 20% or so of which
>> seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time
>> go down until, say, somebody JIT's slot_getattr and slot_deform_tuple.
>> :-)
>
> Hm, let's say somebody were working on something like that. In that case
> the benefits for this precise plan wouldn't yet be that big because a
> good chunk of slot_getattr calls come from execTuplesMatch() which
> doesn't really provide enough context to do JITing (when used for
> hashaggs, there is more so it's JITed). Similarly gather merge's
> heap_compare_slots() doesn't provide such context.
>
> It's about ~9% currently, largely due to the faster aggregate
> invocation. But the big benefit here would be all the deforming and the
> comparisons...

I'm not sure I follow you here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-04 Thread Robert Haas

On Sat, Nov 4, 2017 at 5:55 PM, Andres Freund  wrote:
>> master: 21436.745, 20978.355, 19918.617
>> patch: 15896.573, 15880.652, 15967.176
>>
>> Median-to-median, that's about a 24% improvement.
>
> Neat!

With the attached stack of 4 patches, I get: 10811.768 ms, 10743.424
ms, 10632.006 ms, about a 49% improvement median-to-median.  Haven't
tried it on hydra or any other test cases yet.

skip-gather-project-v1.patch does what it says on the tin.  I still
don't have a test case for this, and I didn't find that it helped very
much, but it would probably help more in a test case with more
columns, and you said this looked like a big bottleneck in your
testing, so here you go.

shm-mq-less-spinlocks-v2.patch is updated from the version I posted
before based on your review comments.  I don't think it's really
necessary to mention that the 8-byte atomics have fallbacks here;
whatever needs to be said about that should be said in some central
place that talks about atomics, not in each user individually.  I
agree that there might be some further speedups possible by caching
some things in local memory, but I haven't experimented with that.

shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only
consume input from the shared queue when the amount of unconsumed
input exceeds 1/4 of the queue size.  This caused a large performance
improvement in my testing because it causes the number of times the
latch gets set to drop dramatically. I experimented a bit with
thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be
enough to capture most of the benefit.

remove-memory-leak-protection-v1.patch removes the memory leak
protection that Tom installed upon discovering that the original
version of tqueue.c leaked memory like crazy.  I think that it
shouldn't do that any more, courtesy of
6b65a7fe62e129d5c2b85cd74d6a91d8f7564608.  Assuming that's correct, we
can avoid a whole lot of tuple copying in Gather Merge and a much more
modest amount of overhead in Gather.  Since my test case exercised
Gather Merge, this bought ~400 ms or so.

Even with all of these patches applied, there's clearly still room for
more optimization, but MacOS's "sample" profiler seems to show that
the bottlenecks are largely shifting elsewhere:

Sort by top of stack, same collapsed (when >= 5):
slot_getattr  (in postgres)706
slot_deform_tuple  (in postgres)560
ExecAgg  (in postgres)378
ExecInterpExpr  (in postgres)372
AllocSetAlloc  (in postgres)319
_platform_memmove$VARIANT$Haswell  (in
libsystem_platform.dylib)314
read  (in libsystem_kernel.dylib)303
heap_compare_slots  (in postgres)296
combine_aggregates  (in postgres)273
shm_mq_receive_bytes  (in postgres)272

I'm probably not super-excited about spending too much more time
trying to make the _platform_memmove time (only 20% or so of which
seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time
go down until, say, somebody JIT's slot_getattr and slot_deform_tuple.
:-)

One thing that might be worth doing is hammering on the AllocSetAlloc
time.  I think that's largely caused by allocating space for heap
tuples and then freeing them and allocating space for new heap tuples.
Gather/Gather Merge are guilty of that, but I think there may be other
places in the executor with the same issue. Maybe we could have
fixed-size buffers for small tuples that just get reused and only
palloc for large tuples (cf. SLAB_SLOT_SIZE).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

skip-gather-project-v1.patch
Description: Binary data

shm-mq-less-spinlocks-v2.patch
Description: Binary data

shm-mq-reduce-receiver-latch-set-v1.patch
Description: Binary data

remove-memory-leak-protection-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-04 Thread Robert Haas

On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund  wrote:
> 2) The spinlocks both on the the sending and receiving side a quite hot:
>
>rafia query leader:
> +   36.16%  postgres  postgres[.] shm_mq_receive
> +   19.49%  postgres  postgres[.] s_lock
> +   13.24%  postgres  postgres[.] SetLatch

Here's a patch which, as per an off-list discussion between Andres,
Amit, and myself, removes the use of the spinlock for most
send/receive operations in favor of memory barriers and the atomics
support for 8-byte reads and writes.  I tested with a pgbench -i -s
300 database with pgbench_accounts_pkey dropped and
max_parallel_workers_per_gather boosted to 10.  I used this query:

select aid, count(*) from pgbench_accounts group by 1 having count(*) > 1;

which produces this plan:

 Finalize GroupAggregate  (cost=1235865.51..5569468.75 rows=1000 width=12)
   Group Key: aid
   Filter: (count(*) > 1)
   ->  Gather Merge  (cost=1235865.51..4969468.75 rows=3000 width=12)
 Workers Planned: 6
 ->  Partial GroupAggregate  (cost=1234865.42..1322365.42
rows=500 width=12)
   Group Key: aid
   ->  Sort  (cost=1234865.42..1247365.42 rows=500 width=4)
 Sort Key: aid
 ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..541804.00 rows=500 width=4)
(10 rows)

On hydra (PPC), these changes didn't help much.  Timings:

master: 29605.582, 29753.417, 30160.485
patch: 28218.396, 27986.951, 26465.584

That's about a 5-6% improvement.  On my MacBook, though, the
improvement was quite a bit more:

master: 21436.745, 20978.355, 19918.617
patch: 15896.573, 15880.652, 15967.176

Median-to-median, that's about a 24% improvement.

Any reviews appreciated.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

shm-mq-less-spinlocks-v1.2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-03 Thread Robert Haas

On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs  wrote:
>> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
>> UPDATE when a relevant unique index exists and does something else,
>> such as your proposal of taking a strong lock, or Peter's proposal of
>> doing this in a concurrency-oblivious manner, in other cases, then
>> those two cases will behave very differently.
>
> The *only* behavioural difference I have proposed would be the *lack*
> of an ERROR in (some) concurrent cases.

I think that's a big difference.  Error vs. non-error is a big deal by
itself; also, the non-error case involves departing from MVCC
semantics just as INSERT .. ON CONFLICT UPDATE does.

> All I have at the moment is that a few people disagree, but that
> doesn't help determine the next action.
>
> We seem to have a few options for PG11
>
> 1. Do nothing, we reject MERGE
>
> 2. Implement MERGE for unique index situations only, attempting to
> avoid errors (Simon OP)
>
> 3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter)
>
> 4. Implement MERGE, while attempting to avoid concurrent ERRORs in
> cases where that is possible.
>
> Stephen, Robert, please say which option you now believe we should pick.

I think Peter has made a good case for #3, so I lean toward that
option.  I think #4 is too much of a non-obvious behavior difference
between the cases where we can avoid those errors and the cases where
we can't, and I don't see where #2 can go in the future other than #4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-11-03 Thread Robert Haas

On Thu, Nov 2, 2017 at 9:53 AM, Connor Wolf
 wrote:
> As such:
> Will compound queries as I describe above basically require a custom type to
> make it possible? My (admittedly naive) expectation
> is that the eventual query for this index will look something like "SELECT *
> FROM example_table WHERE indexed_column <=> target_value < 4;",
> with "<=>" being the operator for the relevant distance calculation
> (hamming, for the BK tree, numeric for the VP-tree).
>
> The existing VP-tree code appears to not support multiple operators
> whatsoever, probably because it was very preliminary.

I'm not an expert in this area in any way whatsoever; I don't know a
VP-tree from a BK-tree from a maple tree.

However, I can tell you that as a general rule, PostgreSQL index
access methods can only apply index quals of the form "WHERE column op
value" or ordering criteria of the form "ORDER BY column op value".
So, in the above example, you might think about trying to set up the
access method so that it can efficiently return values ordered by
indexed_column <=> target_value and then wrapping the ORDER BY query
in a subselect to cut off fetching values at the correct point.  But
no operator class for any access method can directly handle that query
efficiently as you've written it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 9:05 PM, Nico Williams  wrote:
>> Overloading SET to handle both variables and GUCs seems likely to
>> create problems, possibly including security problems.  For example,
>> maybe a security-definer function could leave behind variables to
>> trick the calling code into failing to set GUCs that it intended to
>> set.  Or maybe creating a variable at the wrong time will just break
>> things randomly.
>
> That's already true of GUCs, since there are no access controls on
> set_config()/current_setting().

No, it isn't.  Right now, SET always refers to a GUC, never a
variable, so there's no possibility of getting confused about whether
it's intending to change a GUC or an eponymous variable.  Once you
make SET able to change either one of two different kinds of objects,
then that possibility does exist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan  wrote:
> On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas  wrote:
>> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
>> I think things get a lot more dangerous.  The problem (as Andres
>> pointed out to me this afternoon) is that it seems possible to end up
>> with a situation where there should be two HOT chains on a page, and
>> because of the weakened xmin/xmax checking rules, we end up thinking
>> that the second one is a continuation of the first one, which will be
>> all kinds of bad news.  That would be particularly likely to happen in
>> releases from before we invented HEAP_XMIN_FROZEN, when there's no
>> xmin/xmax matching at all, but could happen on later releases if we
>> are extraordinarily unlucky (i.e. xmin of the first tuple in the
>> second chain happens to be the same as the pre-freeze xmax in the old
>> chain, probably because the same XID was used to update the page in
>> two consecutive epochs).  Fortunately, that commit is (I think) not
>> released anywhere.
>
> FWIW, if you look at the second commit
> (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
> that it doesn't even treat those two cases differently. It was buggy
> even on its own terms. The FrozenTransactionId test used an xmin from
> HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

Oh, wow.  You seem to be correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Robert Haas

On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggs  wrote:
> I've proposed a SQL Standard compliant implementation that would do
> much more than be new syntax over what we already have.
>
> So these two claims aren't accurate: "radical difference" and "syntax
> sugar over a capability we have".

I think those claims are pretty accurate.

The design of INSERT .. ON CONFLICT UPDATE and the supporting
machinery only work if there is a unique index.  It provides radically
different behavior than what you get with any other DML statement,
with completely novel concurrency behavior, and there is no reasonable
way to emulate that behavior in cases where no relevant unique index
exists.  Therefore, if MERGE eventually uses INSERT .. ON CONFLICT
UPDATE when a relevant unique index exists and does something else,
such as your proposal of taking a strong lock, or Peter's proposal of
doing this in a concurrency-oblivious manner, in other cases, then
those two cases will behave very differently.

And if, in the meantime, MERGE can only handle the cases where there
is a unique index, then it can only handle the cases INSERT .. ON
CONFLICT UPDATE can cover, which makes it, as far as I can see,
syntactic sugar over what we already have.  Maybe it's not entirely -
you might be planning to make some minor functional enhancements - but
it's not clear what those are, and I feel like whatever it is could be
done with less work and more elegance by just extending the INSERT ..
ON CONFLICT UPDATE syntax.

And it does seem to be your intention to only handle the cases which
the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because
upthread you wrote this: "I didn't say it but my intention was to just
throw an ERROR if no single unique index can be identified."  I don't
think anybody's putting words into your mouth here.  We're just
reading what you wrote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 8:28 AM, Craig Ringer  wrote:
> On 1 November 2017 at 01:55, Peter Geoghegan  wrote:
>> The problem here is: Iff the first statement uses ON CONFLICT
>> infrastructure, doesn't the absence of WHEN NOT MATCHED imply
>> different semantics for the remaining updates and deletes in the
>> second version of the query? You've removed what seems like a neat
>> adjunct to the MERGE, but it actually changes everything else too when
>> using READ COMMITTED.
>
> Would these concerns be alleviated by adding some kind of Pg-specific
> decoration that constrained concurrency-safe MERGEs?
>
> So your first statement would be
>
>  MERGE CONCURRENTLY ...
>
> and when you removed the WHEN NOT MATCHED clause it'd ERROR because
> that's no longer able to be done with the same concurrency-safe
> semantics?
>
> I don't know if this would be helpful TBH, or if it would negate
> Simon's compatibility goals. Just another idea.

Yes, that fixes the problem.  Of course, it also turns MERGE
CONCURRENTLY into syntactic sugar for INSERT ON CONFLICT UPDATE, which
brings one back to the question of exactly what we're trying to
achieve here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera  wrote:
> Pushed the reverts.
>
> I noticed while doing so that REL_10_STABLE contains the bogus commits.
> Does that change our opinion regarding what to do for people upgrading
> to a version containing the broken commits?  I don't think so, because
>
>   1) we hope that not many people will trust their data to 10.0
>  immediately after release
>   2) the bug is very low probability
>   3) it doesn't look like we can do a lot about it anyway.

Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.

Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid.  Those tuples will be
hinted-committed.  That's not good, but it might not really have much
in the way of consequences.  *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK.  And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG.  If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.

The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous.  The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news.  That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs).  Fortunately, that commit is (I think) not
released anywhere.

Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are
scary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: schema variables

2017-11-02 Thread Robert Haas

On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule  wrote:
> The variables can be modified by SQL command SET (this is taken from
> standard, and it natural)
>
> SET varname = expression;

Overloading SET to handle both variables and GUCs seems likely to
create problems, possibly including security problems.  For example,
maybe a security-definer function could leave behind variables to
trick the calling code into failing to set GUCs that it intended to
set.  Or maybe creating a variable at the wrong time will just break
things randomly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund  wrote:
> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

I think this is a key point.  If the new behavior were merely not
entirely correct, we could perhaps refine it later.  But it's not only
not correct - it actually has the potential to create new problems
that didn't exist before those commits.  And if we release without
reverting those commits then we can't change our mind later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add some const decorations to prototypes

2017-11-02 Thread Robert Haas

On Tue, Oct 31, 2017 at 8:26 PM, Tom Lane  wrote:
> ... but I'm not sure that it's an improvement in cases where you have to
> cast away the const somewhere else.

I agree.  I guess I may be in the minority here but I don't really
like decorating things with const too much because I have tended to
find that once you start adding const, you end up having to add it in
more and more places to avoid warnings, and then eventually that
causes you to have to start casting it away.  Perhaps I was just Doing
It Wrong.

Anyway, I don't see much point in having const if you're just going to
have to cast it to non-const.  Then it wasn't really very const in the
first place...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 1:52 PM, Ashutosh Bapat
 wrote:
> Right now partition keys are immutable but we don't have much code
> written with that assumption. All the code usually keeps a lock on the
> parent till the time they use the information in the partition key. In
> a distant future, which may not exist, we may support ALTER TABLE ...
> PARTITION BY to change partition keys (albeit at huge cost of data
> movement). If we do that, we will have to remember this one-off
> instance of code which assumes that the partition keys are immutable.

I am pretty sure this is by no means the only piece of code which assumes that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas

On Thu, Nov 2, 2017 at 1:45 PM, amul sul  wrote:
> Yes, you are correct, column involved in the partitioning are immutable.
>
> I was just worried about any change in the partition key column that
> might change selected hash function.

Any such change, even if it were allowed, would have to take
AccessExclusiveLock on the child.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

2017-11-02 Thread Robert Haas

On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
 wrote:
> sorry for the delay but I didn't have much time in past weeks to follow
> this thread.

+TimestampTz now = GetCurrentTimestamp();
+
 /* output previously gathered data in a CopyData packet */
 pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

 /*
  * Fill the send timestamp last, so that it is taken as late as possible.
  * This is somewhat ugly, but the protocol is set as it's already used for
  * several releases by streaming physical replication.
  */
 resetStringInfo(&tmpbuf);
-pq_sendint64(&tmpbuf, GetCurrentTimestamp());
+pq_sendint64(&tmpbuf, now);
 memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data, sizeof(int64));

This change falsifies the comments.  Maybe initialize now just after
resetSetringInfo() is done.

-/* fast path */
-/* Try to flush pending output to the client */
-if (pq_flush_if_writable() != 0)
-WalSndShutdown();
+/* Try taking fast path unless we get too close to walsender timeout. */
+if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+  wal_sender_timeout / 2))
+{
+CHECK_FOR_INTERRUPTS();

-if (!pq_is_send_pending())
-return;
+/* Try to flush pending output to the client */
+if (pq_flush_if_writable() != 0)
+WalSndShutdown();
+
+if (!pq_is_send_pending())
+return;
+}

I think it's only the if (!pq_is_send_pending()) return; that needs to
be conditional here, isn't it?  The pq_flush_if_writable() can be done
unconditionally.

Other than that this looks like a reasonable change to me, but I'm not
an expert on this code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-11-02 Thread Robert Haas

On Wed, Nov 1, 2017 at 3:46 PM, amul sul  wrote:
> Although partition constraints become more simple, there isn't any performance
> gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
> copied extended hash function info from the partition key, what if parent is
> changed while we are using it? Do we need to keep lock on parent until commit 
> in
> satisfies_hash_partition?

I don't think it should be possible for the parent to be changed.  I
mean, the partition key is altogether immutable -- it can't be changed
after creation time.  The partition bounds can be changed for
individual partitions but that would require a lock on the partition.

Can you give an example of the kind of scenario about which you are concerned?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-01 Thread Robert Haas

On Tue, Oct 31, 2017 at 2:47 PM, Alexander Korotkov
 wrote:
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?

Sounds like it might break MVCC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

2017-11-01 Thread Robert Haas

On Wed, Nov 1, 2017 at 2:38 AM, Peter Eisentraut
 wrote:
> So this is what it can do:
>
> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();
>
> and that returns those two result sets to the client.

That seems like it is at least arguably a wire protocol break.  Today,
if you send a string containing only one command, you will only get
one answer.

I'm not saying that makes this change utterly unacceptable or anything
-- but I wonder how much application code it will break, and whether
any steps need to be taken to reduce breakage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-11-01 Thread Robert Haas

On Thu, Nov 2, 2017 at 8:01 AM, Craig Ringer  wrote:
> That forces materialization, and I'm guessing part of Tomas's goal
> here is to prevent the need to materialize into a temp table /
> tuplestore / etc.

I get that, but if you're running a query like "SELECT * FROM
bigtable", you don't need parallel query in the first place, because a
single backend is quite capable of sending back the rows as fast as a
client can read them.  If you're running a query like "SELECT * FROM
bigtable WHERE " then that's a good use
case for parallel query, but then materializing it isn't that bad
because the result set is a lot smaller than the original table.

I am not disputing the idea that there are *some* cases where parallel
query is useful and materialization is still undesirable, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-01 Thread Robert Haas

On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke
 wrote:
> Yep.
> But as David reported earlier, if we remove the first part i.e. adding
> cpu_operator_cost per tuple, Merge Append will be preferred over an Append
> node unlike before. And thus, I thought of better having both, but no so
> sure. Should we remove that part altogether, or add both in a single
> statement with updated comments?

I was only suggesting that you update the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-11-01 Thread Robert Haas

On Wed, Nov 1, 2017 at 3:47 AM, Tomas Vondra
 wrote:
> But maybe there's a simpler option - what if we only allow fetches from
> the PARALLEL cursor while the cursor is open? That is, this would work:
>
> BEGIN;
> ...
> DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...;
> FETCH 1000 FROM x;
> FETCH 1000 FROM x;
> FETCH 1000 FROM x;
> CLOSE x;
> ...
> COMMIT;
>
> but adding any other command between the OPEN/CLOSE commands would fail.
> That should close all the holes with parallel-unsafe stuff, right?

I think that still leaves a fair number of scenarios to consider, and
the error handling by itself seems pretty thorny.  Plus it's kind of a
weird mode and, like Craig, I'm not really sure what it gets you.
Maybe if somebody has the use case where this would help, they should
just do:

CREATE TEMP TABLE x AS SELECT * FROM t2 WHERE ...;
DECLARE x CURSOR FOR SELECT * FROM x;

Since e9baa5e9fa147e00a2466ab2c40eb99c8a700824, that ought to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2017-11-01 Thread Robert Haas

On Wed, Nov 1, 2017 at 7:49 AM, Craig Ringer  wrote:
> If the client wants to fetch in chunks it can use a portal and limited
> size fetches. That shouldn't (?) be parallel-unsafe, since nothing
> else can happen in the middle anyway.

I believe sending a limited-size fetch forces serial execution
currently.  If it's true that nothing else can happen in the middle
then we could relax that, but I don't see why that should be true?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2017-10-31 Thread Robert Haas

On Wed, Oct 4, 2017 at 5:58 PM, Ashutosh Bapat
 wrote:
> The view with WCO is local but the modification which violates WCO is
> being made on remote server by a trigger on remote table. Trying to
> control that doesn't seem to be a good idea, just like we can't
> control what rows get inserted on the foreign server when they violate
> local constraints.

I think that's a fair point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories

2017-10-31 Thread Robert Haas

On Mon, Oct 30, 2017 at 6:44 PM, Chris Travers  wrote:
> The attached patch is cleaned up and filed for the commit fest this next
> month:

It's generally better to post the patch on the same message as the
discussion thread, or at least link back to the discussion thread from
the new one.  Otherwise people may have trouble understanding the
history.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads

2017-10-31 Thread Robert Haas

On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munro
 wrote:
> Vik Fearing asked off-list why hash joins appear to read slightly more
> temporary data than they write.  The reason is that we notch up a
> phantom block read when we hit the end of each file.  Harmless but it
> looks a bit weird and it's easily fixed.
>
> Unpatched, a 16 batch hash join reports that we read 30 more blocks
> than we wrote (2 per batch after the first, as expected):
>
>Buffers: shared hit=434 read=16234, temp read=5532 written=5502
>
> With the attached patch:
>
>Buffers: shared hit=547 read=16121, temp read=5502 written=5502

Committed.  Arguably we ought to back-patch this, but it's minor so I didn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-10-31 Thread Robert Haas

On Mon, Oct 16, 2017 at 3:02 PM, Masahiko Sawada  wrote:
> I guess that is the patch I proposed. However I think that there still
> is room for discussion because the patch cannot skip to cleanup vacuum
> when aggressive vacuum, which is one of the situation that I really
> wanted to skip.

Well, I think there are outstanding concerns that the patch in
question is not safe, and I don't see that anything has been done to
resolve them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound after missused pg_resetxlogs

2017-10-31 Thread Robert Haas

On Mon, Oct 16, 2017 at 5:41 PM, alain radix  wrote:
> I’m facing a problem with a PostgreSQL 9.6.2 reporting this error when
> selecting a table
>
> ERROR:  MultiXactId 3268957 has not been created yet -- apparent wraparound
>
> The root cause is not a Postgres bug but a buggy person that used
> pg_resetxlogs obviously without reading or understanding the documentation.

Hmm, I hope you patched that person...

> I’m trying to extract data from the db to create a new one ;-)
>
> But pg_dump logically end with the same error.
>
> I’ve seen the row with t_max  = 3268957 page_inspect, I might use it to
> extract row from the page, but this will not be quite easy.

Not sure if this would work, but maybe try BEGIN ... UPDATE the row,
perhaps via TID qual ... ROLLBACK?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-10-31 Thread Robert Haas

On Mon, Oct 16, 2017 at 5:03 PM, Ashutosh Bapat
 wrote:
> set_append_rel_size() crashes when it encounters a partitioned table
> with a dropped column. Dropped columns do not have any translations
> saved in AppendInfo::translated_vars; the corresponding entry is NULL
> per make_inh_translation_list().
> 1802 att = TupleDescAttr(old_tupdesc, old_attno);
> 1803 if (att->attisdropped)
> 1804 {
> 1805 /* Just put NULL into this list entry */
> 1806 vars = lappend(vars, NULL);
> 1807 continue;
> 1808 }
>
> In set_append_rel_size() we try to attr_needed for child tables. While
> doing so we try to translate a user attribute number of parent to that
> of a child and crash since the translated Var is NULL. Here's patch to
> fix the crash. The patch also contains a testcase to test dropped
> columns in partitioned table.
>
> Sorry for not noticing this problem earlier.

OK, committed.  This is a good example of how having good code
coverage doesn't necessarily mean you've found all the bugs.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Robert Haas

On Mon, Oct 30, 2017 at 11:24 PM, Andres Freund  wrote:
>> >   Perhaps it should rather be pg_add_s32_overflow, or a similar
>> >   naming scheme?
>>
>> Not sure what the s is supposed to be?  Signed?
>
> Yes, signed. So we could add a u32 or something complementing the
> functions already in the patch. Even though overflow checks are a heck
> of a lot easier to write for unsigned ints, the intrinsics are still
> faster.  I don't have any sort of strong feelings on the naming.

Right, I guess including the s is probably a good idea then.

>> I suggest that if we think we don't need -fwrapv any more, we ought to
>> remove it.  Otherwise, we won't find out if we're wrong.
>
> I agree that we should do so at some point not too far away in the
> future. Not the least because we don't specify this kind of C dialect in
> a lot of other compilers. Additionally the flag causes some slowdown
> (because e.g. for loop variables are optimized less). But I'm fairly
> certain it needs a bit more care that I've invested as of now - should
> probably at least compile with -Wstrict-overflow=some-higher-level, and
> with ubsan. I'm fairly certain there's more bogus overflow checks
> around...

Makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-30 Thread Robert Haas

On Mon, Oct 30, 2017 at 5:52 PM, amul sul  wrote:
> Actually, int4[] is also inappropriate type as we have started using a 64bit
> hash function.  We need something int8[] which is not available, so that I
> have used ANYARRAYOID in the attached patch(0004).

I don't know why you think int8[] is not available.

rhaas=# select 'int8[]'::regtype;
 regtype
--
 bigint[]
(1 row)

>> I wrote the following query
>> to detect problems of this type, and I think we might want to just go
>> ahead and add this to the regression test suite, verifying that it
>> returns no rows:
>>
>> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[]
>> from pg_proc where provariadic != 0
>> and case proargtypes[array_length(proargtypes, 1)-1]
>> when 2276 then 2276 -- any -> any
>> when 2277 then 2283 -- anyarray -> anyelement
>> else (select t.oid from pg_type t where t.typarray =
>> proargtypes[array_length(proargtypes, 1)-1]) end
>> != provariadic;
>>
>
> Added in 0001 patch.

Committed.

> One advantage of current implementation is that we can see which hash
> function are used for the each partitioning column and also we don't need to
> worry about user specified opclass and different input types.
>
> Something similar I've tried in my initial patch version[1], but I have missed
> user specified opclass handling for each partitioning column.  Do you want me
> to handle opclass using RelabelType node? I am afraid that, that would make
> the \d+ output more horrible than the current one if non-default opclass used.

Maybe we should just pass the OID of the partition (or both the
partition and the parent, so we can get the lock ordering right?)
instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Current int & float overflow checking is slow.

2017-10-30 Thread Robert Haas

On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund  wrote:
> 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
>   These use compiler intrinsics on gcc/clang. If that's not
>   available, they cast to a wider type and to overflow checks. For
>   64bit there's a fallback for the case 128bit math is not
>   available (here I stole from an old patch of Greg's).
>
>   These fallbacks are, as far as I can tell, C free of overflow
>   related undefined behaviour.

Looks nice.

>   Perhaps it should rather be pg_add_s32_overflow, or a similar
>   naming scheme?

Not sure what the s is supposed to be?  Signed?

> 0002) Converts int.c, int8.c and a smattering of other functions to use
>   the new facilities. This removes a fair amount of code.
>
>   It might make sense to split this up further, but right now that's
>   the set of functions that either are affected performancewise by
>   previous overflow checks, and/or relied on wraparound
>   overflow. There's probably more places, but this is what I found
>   by visual inspection and compiler warnings.

I lack the patience to review this tonight.

> 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
>   it seems like an important test for the new facilities. Without
>   0002, tests would fail after this, after it all tests run
>   successfully.

I suggest that if we think we don't need -fwrapv any more, we ought to
remove it.  Otherwise, we won't find out if we're wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas

On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera  wrote:
>> A sort of middle way would be to keep the secondary checkpoint around
>> but never try to replay from that point, or only if a specific flag is
>> provided.
>
> Why do you want to keep the secondary checkpoint?  If there is no way to
> automatically start a recovery from the prior checkpoint, is it really
> possible to do the same manually?  I think the only advantage of keeping
> it is that the WAL files are kept around for a little bit longer.  But
> is that useful?  Surely for any environment where you really care, you
> have a WAL archive somewhere, so it doesn't matter if files are removed
> from the primary's pg_xlog dir.

(apologies for the empty message)

I don't really want anything in particular here, other than for the
system to be reliable.  If we're confident that there's zero value in
the secondary checkpoint then, sure, ditch it.  Even if you have the
older WAL files around in an archive, it doesn't mean that you know
where the previous checkpoint start location is ... but actually, come
to think of it, if you did need to know that, you could just run
pg_waldump to find it.  That probably wasn't true when this code was
originally written, but it is today.

I was mostly just thinking out loud, listing another option rather
than advocating for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas

On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund  wrote:
>> > I think it does the contrary. The current mechanism is, in my opinion,
>> > downright dangerous:
>> > https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de
>>
>> A sort of middle way would be to keep the secondary checkpoint around
>> but never try to replay from that point, or only if a specific flag is
>> provided.
>
> Why do you want to keep the secondary checkpoint?  If there is no way to
> automatically start a recovery from the prior checkpoint, is it really
> possible to do the same manually?  I think the only advantage of keeping
> it is that the WAL files are kept around for a little bit longer.  But
> is that useful?  Surely for any environment where you really care, you
> have a WAL archive somewhere, so it doesn't matter if files are removed
> from the primary's pg_xlog dir.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove secondary checkpoint

2017-10-30 Thread Robert Haas

On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund  wrote:
> I think it does the contrary. The current mechanism is, in my opinion,
> downright dangerous:
> https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de

A sort of middle way would be to keep the secondary checkpoint around
but never try to replay from that point, or only if a specific flag is
provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-30 Thread Robert Haas

On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
 wrote:
> Hello. I made some bugfixes and rewrite the patch.

I don't think it's a good idea to deliberately leave the state of the
standby different from the state of the  master on the theory that it
won't matter.  I feel like that's something that's likely to come back
to bite us.

Giving LockAcquireExtended() an argument that causes some
AccessExclusiveLocks not all to be logged also seems pretty ugly,
especially because there are no comments whatsoever explaining the
rationale.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-30 Thread Robert Haas

On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs  wrote:
> Nothing I am proposing blocks later work.

That's not really true.  Nobody's going to be happy if MERGE has one
behavior in one set of cases and an astonishingly different behavior
in another set of cases.  If you adopt a behavior for certain cases
that can't be extended to other cases, then you're blocking a
general-purpose MERGE.

And, indeed, it seems that you're proposing an implementation that
adds no new functionality, just syntax compatibility.  Do we really
want or need two syntaxes  for the same thing in core?  I kinda think
Peter might have the right idea here.  Under his proposal, we'd be
getting something that is, in a way, new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallelize queries containing initplans

2017-10-29 Thread Robert Haas

On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila  wrote:
> Now that the PARAM_EXTERN issue is fixed, I have rebased this patch.
> This patch had been switched to Ready For Committer in last CF, then
> Robert had comments which I have addressed, so I think the status
> should be switched back to Ready For committer.  Let me know if you
> think it should be switched to some other status.

The change to ExplainPrintPlan doesn't look good to me, because it
actually moves the initPlan; I don't think it's good for EXPLAIN to
mutate the plan state tree.  It should find a way to display the
results *as if* the initPlans were attached to the subnode, but
without actually moving them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel worker error

2017-10-29 Thread Robert Haas

On Mon, Oct 30, 2017 at 8:25 AM, Amit Kapila  wrote:
> Thanks.  I have closed this entry in CF app, however, I am not sure
> what is the best way to deal with the entry present in PostgreSQL 10
> Open Items page[1].  The entry for this bug seems to be present in
> Older Bugs section.  Shall we leave it as it is or do we want to do
> something else with it?
>
> [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

How about just adding an additional bullet point for that item that
says "fixed by commit blah blah for 10.1"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-29 Thread Robert Haas

On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila  wrote:
> I think we need to make changes in exec_simple_recheck_plan to make
> the behavior similar to HEAD.  With the attached patch, all tests
> passed with force_parallel_mode.

Committed to REL_10_STABLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2017-10-29 Thread Robert Haas

On Sat, Oct 28, 2017 at 4:52 PM, Chris Travers  wrote:
> There are still some cleanup bits needed here but I wanted to get feedback
> on my general direction.
>
> I hope to submit for commit fest soon if the general feedback is good.

I think you should submit to the CommitFest regardless, to increase
your chances of getting feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel worker error

2017-10-29 Thread Robert Haas

On Sun, Oct 29, 2017 at 12:02 PM, Amit Kapila  wrote:
> This patch no longer applies, so attached rebased patches.  I have
> also created patches for v10 as we might want to backpatch the fix.
> Added the patch in CF (https://commitfest.postgresql.org/15/1342/)

Thanks.  I picked the second variant, committed, and also back-patched to 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementing pg_receivewal --no-sync

2017-10-29 Thread Robert Haas

On Sun, Oct 29, 2017 at 3:42 AM, Michael Paquier
 wrote:
> Okay. Here is an updated patch incorporating those comments.

Committed with a little wordsmithing on the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-10-29 Thread Robert Haas

ween WITH and the parenthesis which follows.

In 0003, the changes to partition_bounds_copy claim that I shouldn't
worry about the fact that typlen is set to 4 because datumCopy won't
use it for a pass-by-value datatype, but I think that calling
functions with incorrect arguments and hoping that they ignore them
and therefore nothing bad happens doesn't sound like a very good idea.
Fortunately, I think the actual code is fine; I think we just need to
change the comments.  For hash partitioning, the datums array always
contains two integers, which are of type int4, which is indeed a
pass-by-value type of length 4 (note that if we were using int8 for
the modulus and remainder, we'd need to set byval to FLOAT8PASSBYVAL).
I would just write this as:

if (hash_part)
{
typlen = sizeof(int32); /* always int4 */
byval = true;   /* int4 is pass-by-value */
}

+   for (i = 0; i < nkeys; i++)
+   {
+   if (!isnull[i])
+   rowHash = hash_combine64(rowHash,
DatumGetUInt64(hash_array[i]));
+   }

Excess braces.

I think it might be better to inline the logic in mix_hash_value()
into each of the two callers.  Then, the callers wouldn't need Datum
hash_array[PARTITION_MAX_KEYS]; they could just fold each new hash
value into a uint64 value.  That seems likely to be slightly faster
and I don't see any real downside.

rhaas=# create table natch (a citext, b text) partition by hash (a);
ERROR:  XX000: missing support function 2(16398,16398) in opfamily 16437
LOCATION:  RelationBuildPartitionKey, relcache.c:954

It shouldn't be possible to reach an elog() from SQL, and this is not
a friendly error message.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix typo in blvacuum.c

2017-10-28 Thread Robert Haas

On Tue, Oct 17, 2017 at 3:48 AM, Seki, Eiji  wrote:
> I found a typo in comment of blvacuum.c, so attach the patch for it.

Thanks.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-10-28 Thread Robert Haas

On Thu, Oct 19, 2017 at 9:05 AM, Amit Khandekar  wrote:
>> + *ExecAppendEstimate
>> + *
>> + *estimates the space required to serialize Append node.
>>
>> Ugh, this is wrong, but I notice it follows various other
>> equally-wrong comments for other parallel-aware node types. I guess
>> I'll go fix that.  We are not in serializing the Append node.
>
> I didn't clealy get this. Do you think it should be "space required to
> copy the Append node into the shared memory" ?

No, because the Append node is *NOT* getting copied into shared
memory.  I have pushed a comment update to the existing functions; you
can use the same comment for this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-10-28 Thread Robert Haas

On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
 wrote:
> 1. Added separate patch for costing Append node as discussed up-front in the
> patch-set.
> 2. Since we now cost Append node, we don't need
> partition_wise_agg_cost_factor
> GUC. So removed that. The remaining patch hence merged into main
> implementation
> patch.
> 3. Updated rows in test-cases so that we will get partition-wise plans.

With 0006 applied, cost_merge_append() is now a little bit confused:

/*
 * Also charge a small amount (arbitrarily set equal to operator cost) per
 * extracted tuple.  We don't charge cpu_tuple_cost because a MergeAppend
 * node doesn't do qual-checking or projection, so it has less overhead
 * than most plan nodes.
 */
run_cost += cpu_operator_cost * tuples;

/* Add MergeAppend node overhead like we do it for the Append node */
run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;

The first comment says that we don't add cpu_tuple_cost, and the
second one then adds half of it anyway.

I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
because as you say it's used twice, but I don't think that should be
exposed in cost.h; I'd make it private to costsize.c and rename it to
something like APPEND_CPU_COST_MULTIPLIER.  The word DEFAULT, in
particular, seems useless to me, since there's no provision for it to
be overridden by a different value.

What testing, if any, can we think about doing with this plan to make
sure it doesn't regress things?  For example, if we do a TPC-H run
with partitioned tables and partition-wise join enabled, will any
plans change with this patch?  Do they get faster or not?  Anyone have
other ideas for what to test?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding table_constraint description in ALTER TABLE synopsis

2017-10-28 Thread Robert Haas

On Thu, Oct 26, 2017 at 3:23 PM, Lætitia Avrot  wrote:
> In documentation, I've found that table_constraint is used in the ALTER
> TABLE synopsis but that definition of table_constraint is missing, so I
> submitted bug #14873.
>
> I found the table_constraint definition in the CREATE TABLE synopsis and I
> just copied/pasted it on the ALTER TABLE synopsis.
>
> The patch should apply to MASTER.I build and tested it successfully on my
> computer.
>
> There shouldn't be any platform-specific content.
>
> You will find enclosed my patch. I tried my best to follow instructions on
> how to submit a patch.

I'd say you did a good job.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typos in src/backend/optimizer/README

2017-10-28 Thread Robert Haas

On Fri, Oct 27, 2017 at 9:34 AM, Etsuro Fujita
 wrote:
> This sentence in the section of Partition-wise joins in
> src/backend/optimizer/README should be fixed: "This technique of breaking
> down a join between partition tables into join between their partitions is
> called partition-wise join."
>
> (1) s/a join between partition tables/a join between partitioned tables/
> (2) s/join between their partitions/joins between their partitions/
>
> It might be okay to leave #2 as-is, but I'd like to propose to change that
> way to make the meaning clear.

I think you are right.  I have committed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unsafe tuple releasing in get_default_partition_oid

2017-10-28 Thread Robert Haas

On Sat, Oct 28, 2017 at 10:03 AM, Julien Rouhaud  wrote:
> I just noticed that get_default_partition_oid() tries to release the
> tuple even if it isn't valid.
> Trivial patch attached.

Oops.  I wonder how that managed to survive testing.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-10-27 Thread Robert Haas

On Thu, Oct 26, 2017 at 4:11 PM, Masahiko Sawada  wrote:
>> Why do we want the the backend to linger behind, once it has added its
>> foreign transaction entries in the shared memory and informed resolver
>> about it? The foreign connections may take their own time and even
>> after that there is no guarantee that the foreign transactions will be
>> resolved in case the foreign server is not available. So, why to make
>> the backend wait?
>
> Because I don't want to break the current user semantics. that is,
> currently it's guaranteed that the subsequent reads can see the
> committed result of previous writes even if the previous transactions
> were distributed transactions.

Right, this is very important, and having the backend wait for the
resolver(s) is, I think, the right way to implement it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel safety for extern params

2017-10-27 Thread Robert Haas

On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila  wrote:
>> I think the Param case should be mentioned after "... but" not before
>> - i.e. referencing the child node's output... but setrefs.c might also
>> have copied a Const or Param is-is.
>
> I am not sure if we can write the comment like that (.. copied a Const
> or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a
> special handling for Var and Param where constants are copied as-is
> via expression_tree_mutator.  Also as proposed, the handling for
> params is more like Var in exec_save_simple_expr.

I committed fix_parallel_mode_nested_execution_v2.patch with some
cosmetic tweaks.  I back-patched it to 10 and 9.6, then had to fix
some issues reported by Tom as followup commits.

With respect to the bit quoted above, I rephrased the comment in a
slightly different way that hopefully is a reasonable compromise,
combined it with the main patch, and pushed it to master.  Along the
way I also got rid of the if statement you introduced and just made
the Assert() more complicated instead, which seems better to me.

When I tried back-porting the patch to v10 I discovered that the
plpgsql changes conflict heavily and that ripping them out all the way
produces regression failures under force_parallel_mode.  I think I see
why that's happening but it's not obvious how to me how to adapt
b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code.  Do you want
to have a crack at it or should we just leave this as a master-only
fix?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-10-27 Thread Robert Haas

On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera  wrote:
> I noticed that RelationBuildPartitionKey is generating a partition key
> in a temp context, then creating a private context and copying the key
> into that.  That seems leftover from some previous iteration of some
> other patch; I think it's pretty reasonable to create the new context
> right from the start and allocate the key there directly instead.  Then
> there's no need for copy_partition_key at all.

We could do that, but the motivation for the current system was to
avoid leaking memory in a long-lived context.  I think the same
technique is used elsewhere for similar reasons.  I admit I haven't
checked whether there would actually be a leak here if we did it as
you propose, but I wouldn't find it at all surprising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scan for cube and seg

2017-10-27 Thread Robert Haas

On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin  wrote:
> For cube there is new default opclass.

I seem to recall that changing the default opclass causes unsolvable
problems with upgrades.  You might want to check the archives for
previous discussions of this issue; unfortunately, I don't recall the
details off-hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Robert Haas

On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs  wrote:
> I didn't say it but my intention was to just throw an ERROR if no
> single unique index can be identified.
>
> It could be possible to still run MERGE in that situaton but we would
> need to take a full table lock at ShareRowExclusive. It's quite likely
> that such statements would throw duplicate update errors, so I
> wouldn't be aiming to do anything with that for PG11.

Like Peter, I think taking such a strong lock for a DML statement
doesn't sound like a very desirable way forward.  It means, for
example, that you can only have one MERGE in progress on a table at
the same time, which is quite limiting.  It could easily be the case
that you have multiple MERGE statements running at once but they touch
disjoint groups of rows and therefore everything works.  I think the
code should be able to cope with concurrent changes, if nothing else
by throwing an ERROR, and then if the user wants to ensure that
doesn't happen by taking ShareRowExclusiveLock they can do that via an
explicit LOCK TABLE statement -- or else they can prevent concurrency
by any other means they see fit.

Other problems with taking ShareRowExclusiveLock include (1) probable
lock upgrade hazards and (2) do you really want MERGE to kick
autovacuum off of your giant table?  Probably not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

2017-10-27 Thread Robert Haas

On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs  wrote:
> Questions?

I think one of the reasons why Peter Geoghegan decided to pursue
INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL
syntax, he felt free to mandate a non-standard SQL requirement, namely
the presence of a unique index on the arbiter columns.  If MERGE's
join clause happens to involve equality conditions on precisely the
same set of columns as some unique index on the target table, then I
think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure
and I suspect there won't be too many problems.  However, if it
doesn't, then what?  You could decree that such cases will fail, but
that might not meet your use case for developing the feature.  Or you
could try to soldier on without the INSERT .. ON CONFLICT UPDATE
machinery, but that means, I think, that sometimes you will get
serialization anomalies - at least, I think, you will sometimes obtain
results that couldn't have been obtained under any serial order of
execution, and maybe it would end up being possible to fail with
serialization errors or unique index violations.

In the past, there have been objections to implementations of MERGE
which would give rise to such serialization anomalies, but I'm not
sure we should feel bound by those discussions.  One thing that's
different is that the common and actually-useful case can now be made
to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE;
if less useful cases are vulnerable to some weirdness, maybe it's OK
to just document the problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: BRIN bloom indexes

2017-10-27 Thread Robert Haas

On Fri, Oct 27, 2017 at 2:55 PM, Alvaro Herrera  wrote:
> I was rather thinking that if we can make this very robust against the
> index growing out of proportion, we should consider ditching the
> original minmax and replace it with multirange minmax, which seems like
> it'd have much better behavior.

If the multirange stuff can be done in such a way that it's just an
updated version of the same opclass, and backward-compatible on disk,
then I think this would be OK.  But otherwise I don't think we ditch
what already exists.  That would break upgrades via both pg_upgrade
and pg_dump, which seems like too high a price to pay to get rid of
some arguably-worse code.  It's actually WORSE to drop an opclass
(which will make dumps not restore) than to do something like bump
HASH_VERSION (which doesn't affect pg_dump at all and for pg_upgrade
only requires post-upgrade steps rather than failing outright).

> I don't see any reason to put any of this in contrib.

Well, for one thing, it makes it easier to drop stuff later if we
decide we don't really want it.  I think that whichever BRIN opclasses
are thought to be high quality and of general utility can go into
core, just as we've done with other index AMs.  However, if we think
that something is useful-ish but maybe not something to which we want
to make a permanent commitment, putting it into contrib is good for
that.

Upgrades are easier for things in contrib, too, because there's a
built-in mechanism for people to try updating the SQL extensions
(ALTER EXTENSION .. UPDATE) and if it fails they can adjust things and
try again.  When you just make a hard change to SQL definitions in a
new release, any failures that result from those changes just turn
into upgrade failures, which is IMHO a lot more painful than a failure
to update an extension version while the database is still up and
usable the whole time.

For instance, if pg_stat_activity were bundled in an extension and we
made the C code backward-compatibility with old extension versions,
then some of the upgrade pain users have had with that over the years
could have been avoided.  People could update to the new version
without failures and then at their leisure try to update.  If the
update failed due to dependencies, then they would have time to figure
out what to do about it and try again later; in the meantime, they'd
be on the new version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-27 Thread Robert Haas

eaches libpq to ignore the
NegotiateProtocolVersion message.  With that patch applied, make
check-world passes, which seems to show that the server-side changes
are not totally broken.  More testing would be great, of course.

Some thoughts on the future:

- libpq should grow an option to force a specific protocol version.
Andres already proposed one to force 2.0, but now we probably want to
generalize that to also allow forcing a particular minor version.
That seems useful for testing, if nothing else, and might let us add
TAP tests that this stuff works as intended.

- Possibly we should commit the portion of the testing patch which
ignores NegotiateProtocolVersion to v11, maybe also adding a
connection status function that lets users inquire about whether a
NegotiateProtocolVersion message was received and, if so, what
parameters it reported as unrecognized and what minor version it
reported the server as speaking.   The existing PQprotocolVersion
interface looks inadequate, as it seems to return only the major
version.

- On further reflection, I think the reconnect functionality you
proposed previously is probably a good idea.  It won't be necessary
with servers that have been patched to send NegotiateProtocolVersion,
but there may be quite a few that haven't for a long time to come, and
although an automated reconnect is a little annoying, it's a lot less
annoying than an outright connection failure.  So that part of your
patch should probably be resubmitted when and if we bump the version
to 3.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgwire-be-rmh.patch
Description: Binary data


pgwire-libpq-test.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

2017-10-27 Thread Robert Haas

On Thu, Oct 26, 2017 at 11:14 AM, Ashutosh Bapat
 wrote:
> In a nearby thread, we are discussing about atomic commit of
> transactions involving foreign transactions. For maintaining
> consistency, atomicity of transactions writing to foreign server, we
> will need to create local transactions. Will that be possible on
> standby? Obviously, we could add a restriction that no consistency and
> atomic commit is guaranteed when foreign servers are written from a
> standby. I am not sure how easy would be to impose such a restriction
> and whether such a restriction would be practical.

Yeah, that feature definitely can't work on a standby.  But we could
probably allow writes when that feature is not in use.  One thing that
bothers me about Alexander's patch is that there wouldn't be any way
to distinguish between those two cases.  Maybe we need a callback to
ask the FDW "given the configured options, could you work on a
standby?"

However, as Michael also points out, it's arguably wrong to allow a
nominally read-only transaction to write data regardless of whether it
works.  In the case of a standby it could be argued that your
transaction is only read-only because you had no other choice, but
nonetheless that's how it is marked.  I have a feeling that if we
extend the definition of "read only" to mean "sometimes allow writes",
we may regret it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implementing pg_receivewal --no-sync

2017-10-26 Thread Robert Haas

On Wed, Oct 25, 2017 at 10:03 PM, Michael Paquier
 wrote:
> This sentence is actually wrong, a feedback message is never sent with
> the feedback message.

Eh?

I think this looks basically fine, though I'd omit the short option
for it.  There are only so many letters in the alphabet, so let's not
use them up for developer-convenience options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 21742 matches

Mail list logo