Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
return typtup; +} A function as general as typeGet() certainly does not belong in parse_clause.c in the middle of a long list of temporal functions. This particular function is also a bad idea in general, because typtup is only a valid pointer until ReleaseSysCache() is called. This will appear to work normally because, normally, no relevant cache invalidation messages will show up at the wrong time, but if one does then typtup will be pointing off into outer space. You can find bugs like this by testing with CLOBBER_CACHE_ALWAYS defined (warning: this makes things very slow). But the general point I want to make here is actually about inventing your own idioms vs. using the ones we've got -- if you're going to invent new general-purpose primitives, they need to go next to the existing functions that do similar things, not in whatever part of the code you first decided you needed them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GatherMerge misses to push target list
On Mon, Sep 18, 2017 at 7:02 AM, Rushabh Lathia wrote: >> In that case, can you please mark the patch [1] as ready for committer in >> CF app > > Done. I think this patch is mostly correct, but I think the change to planner.c isn't quite right. ordered_rel->reltarget is just a dummy target list that produces nothing. Instead, I think we should pass path->pathtarget, representing the idea that whatever Gather Merge produces as output is the same as what you put into it. See attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company pushdown-gathermerge-tlist.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
On Thu, Nov 9, 2017 at 3:55 AM, amul sul wrote: > It took me a little while to understand this calculation. You have moved this > code from tbm_create(), but I think you should move the following > comment as well: I made an adjustment that I hope will address your concern here, made a few other adjustments, and committed this. One point of concern that wasn't entirely addressed in the above discussion is the accuracy of this formula: + lossy_pages = Max(0, heap_pages - maxentries / 2); When I first looked at Dilip's test results, I thought maybe this formula was way off. But on closer study, the formula does a decent (not fantastic) job of estimating the number of exact pages. The fact that the number of lossy pages is off is just because the Mackert and Lohman formula is overestimating how many pages are fetched. Now, in Dilip's results, this formula more often than not - but not invariably - predicted more exact pages than we actually got. So possibly instead of maxentries / 2 we could subtract maxentries or some other multiple of maxentries. I don't know what's actually best here, but I think there's a strong argument that this is an improvement as it stands, and we can adjust it later if it becomes clear what would be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Fri, Nov 10, 2017 at 2:24 PM, Tom Lane wrote: > Robert Haas writes: >> I decided to try instead teaching the planner to keep track of the >> types of PARAM_EXEC parameters as they were created, and that seems to >> work fine. See 0001, attached. > > I did not look at the other part, but 0001 looks reasonable to me. Thanks for looking. > I might quibble with the grammar in the generate_new_param comment: > > - * need to record the PARAM_EXEC slot number as being allocated. > + * need to make sure we have record the type in paramExecTypes (otherwise, > + * there won't be a slot allocated for it). > */ > > I'd just go with "need to record the type in ..." Noted. > Also, I wonder whether the InvalidOid hack in SS_assign_special_param > requires commentary. It might be safer to use a valid type OID there, > perhaps VOIDOID or INTERNALOID. I think it's pretty straightforward -- if, as the existing comments say, no Param node will be present and no value will be stored, then we do not and cannot record the type of the value that we're not storing. There is existing precedent for using InvalidOid to denote the absence of a parameter -- cf. copyParamList, SerializeParamList. That convention was not invented by me or the parallel query stuff, although it has become more widespread for that reason. I am disinclined to have this patch invent a New Way To Do It. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
On Wed, Sep 27, 2017 at 7:07 AM, amul sul wrote: > Attaching POC patch that throws an error in the case of a concurrent update > to an already deleted tuple due to UPDATE of partition key[1]. > > In a normal update new tuple is linked to the old one via ctid forming > a chain of tuple versions but UPDATE of partition key[1] move tuple > from one partition to an another partition table which breaks that chain. This patch needs a rebase. It has one whitespace-only hunk that should possibly be excluded. I think the basic idea of this is sound. Either you or Amit need to document the behavior in the user-facing documentation, and it needs tests that hit every single one of the new error checks you've added (obviously, the tests will only work in combination with Amit's patch). The isolation should be sufficient to write such tests. It needs some more extensive comments as well. The fact that we're assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal, and should at least be documented in itemptr.h in the comments for the ItemPointerData structure. I suspect ExecOnConflictUpdate needs an update for the HeapTupleUpdated case similar to what you've done elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Tue, Nov 7, 2017 at 4:45 AM, Amit Kapila wrote: > As mentioned, changed the status of the patch in CF app. I spent some time reviewing this patch today and found myself still quite uncomfortable with the fact that it was adding execution-time work to track the types of parameters - types that would usually not even be used. I found the changes to nodeNestLoop.c to be particularly objectionable, because we could end up doing the work over and over when it is actually not needed at all, or at most once. I decided to try instead teaching the planner to keep track of the types of PARAM_EXEC parameters as they were created, and that seems to work fine. See 0001, attached. 0002, attached, is my worked-over version of the rest of the patch. I moved the code that serializes and deserializes PARAM_EXEC from nodeSubplan.c -- which seemed like a strange choice - to execParallel.c. I removed the type OID from the serialization format because there's no reason to do that any more; the worker already knows the types from the plan. I did some renaming of the functions involved and some adjustment of the comments to refer to "PARAM_EXEC parameters" instead of initPlan parameters, because there's no reason that I can see why this can only work for initPlans. A Gather node on the inner side of a nested loop doesn't sound like a great idea, but I think this infrastructure could handle it (though it would need some more planner work). I broke a lot of long lines in your version of the patch into multiple lines; please try to be attentive to this issue when writing patches in general, as it is a bit tedious to go through and insert line breaks in many places. Please let me know your thoughts on the attached patches. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0001-param-exec-types-v1.patch Description: Binary data 0002-pq-pushdown-initplan-rebased.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect comment for build_child_join_rel
On Fri, Nov 10, 2017 at 4:34 AM, Etsuro Fujita wrote: > Here is a small patch for $Subject. Good catch. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks
On Fri, Nov 10, 2017 at 10:34 AM, Tom Lane wrote: > I think as far as that goes, we can just change to "Therefore, by default > their use is restricted ...". Then I suggest adding a para > after that, with wording along the lines of > > It is possible to GRANT use of server-side lo_import and lo_export to > non-superusers, but careful consideration of the security implications > is required. A malicious user of such privileges could easily parlay > them into becoming superuser (for example by rewriting server > configuration files), or could attack the rest of the server's file > system without bothering to obtain database superuser privileges as > such. Access to roles having such privilege must therefore be guarded > just as carefully as access to superuser roles. Nonetheless, if use > of server-side lo_import or lo_export is needed for some routine task, > it's safer to use a role of this sort than full superuser privilege, > as that helps to reduce the risk of damage from accidental errors. +1. That seems like great language to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Fri, Nov 10, 2017 at 5:44 AM, Amit Kapila wrote: > I am seeing the assertion failure as below on executing the above > mentioned Create statement: > > TRAP: FailedAssertion("!(!(tup->t_data->t_infomask & 0x0008))", File: > "heapam.c", Line: 2634) > server closed the connection unexpectedly > This probably means the server terminated abnormally OK, I see it now. Not sure why I couldn't reproduce this before. I think the problem is not actually with the code that I just wrote. What I'm seeing is that the slot descriptor's tdhasoid value is false for both the funnel slot and the result slot; therefore, we conclude that no projection is needed to remove the OIDs. That seems to make sense: if the funnel slot doesn't have OIDs and the result slot doesn't have OIDs either, then we don't need to remove them. Unfortunately, even though the funnel slot descriptor is marked tdhashoid = false, the tuples being stored there actually do have OIDs. And that is because they are coming from the underlying sequential scan, which *also* has OIDs despite the fact that tdhasoid for it's slot is false. This had me really confused until I realized that there are two processes involved. The problem is that we don't pass eflags down to the child process -- so in the user backend, everybody agrees that there shouldn't be OIDs anywhere, because EXEC_FLAG_WITHOUT_OIDS is set. In the parallel worker, however, it's not set, so the worker feels free to do whatever comes naturally, and in this test case that happens to be returning tuples with OIDs. Patch for this attached. I also noticed that the code that initializes the funnel slot is using its own PlanState rather than the outer plan's PlanState to call ExecContextForcesOids. I think that's formally incorrect, because the goal is to end up with a slot that is the same as the outer plan's slot. It doesn't matter because ExecContextForcesOids doesn't care which PlanState it gets passed, but the comments in ExecContextForcesOids imply that somebody it might, so perhaps it's best to clean that up. Patch for this attached, too. And here are the other patches again, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0001-pass-eflags-to-worker-v1.patch Description: Binary data 0002-forces-oids-neatnikism-v1.patch Description: Binary data 0003-skip-gather-project-v2.patch Description: Binary data 0004-shm-mq-less-spinlocks-v2.patch Description: Binary data 0005-shm-mq-reduce-receiver-latch-set-v1.patch Description: Binary data 0006-remove-memory-leak-protection-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Thu, Nov 9, 2017 at 9:31 PM, Amit Kapila wrote: > Have you set force_parallel_mode=regress; before running the > statement? Yes, I tried that first. > If so, then why you need to tune other parallel query > related parameters? Because I couldn't get it to break the other way, I then tried this. Instead of asking me what I did, can you tell me what I need to do? Maybe a self-contained reproducible test case including exactly what goes wrong on your end? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Wed, Nov 1, 2017 at 6:16 AM, amul sul wrote: > Fixed in the 0003 patch. I have committed this patch set with the attached adjustments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company hash-adjustments.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks
On Thu, Nov 9, 2017 at 2:56 PM, Stephen Frost wrote: > I agree that it may not be obvious which cases lead to a relatively easy > way to obtain superuser and which don't, and that's actually why I'd > much rather it be something that we're considering and looking out for > because, frankly, we're in a much better position generally to realize > those cases than our users are. I disagree. It's flattering to imagine that PostgreSQL developers, as a class, are smarter than PostgreSQL users, but it doesn't match my observations. > Further, I agree entirely that we > shouldn't be deciding that certain capabilities are never allowed to be > given to a user- but that's why superuser *exists* and why it's possible > to give superuser access to multiple users. The question here is if > it's actually sensible for us to make certain actions be grantable to > non-superusers which allow that role to become, or to essentially be, a > superuser. What that does, just as it does in the table case, from my > viewpoint at least, is make it *look* to admins like they're grant'ing > something less than superuser when, really, they're actually giving the > user superuser-level access. That violates the POLA because the admin > didn't write 'ALTER USER joe WITH SUPERUSER', they just wrote 'GRANT > EXECUTE ON lo_import() TO joe;'. This is exactly the kind of thing that I'm talking about. Forcing an administrator to hand out full superuser access instead of being able to give just lo_import() makes life difficult for users for no good reason. The existence of a theoretically-exploitable security vulnerability is not tantamount to really having access, especially on a system with a secured logging facility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Thu, Nov 9, 2017 at 3:47 AM, Amit Kapila wrote: > I think I understood your concern after some offlist discussion and it > is primarily due to the inheritance related check which can skip the > generation of gather paths when it shouldn't. So what might fit > better here is a straight check on the number of base rels such that > allow generating gather path in set_rel_pathlist, if there are > multiple baserels involved. I have used all_baserels which I think > will work better for this purpose. Yes, that looks a lot more likely to be correct. Let's see what Tom thinks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks
On Thu, Nov 9, 2017 at 1:52 PM, Stephen Frost wrote: >> I disagree that that is, or should be, a guiding principle. > > It was what I was using as the basis of the work which I did in this > area and, at least at that time, there seemed to be little issue with > that. Well, there actually kind of was. It came up here: http://postgr.es/m/ca+tgmoy6ne5n4jc5awxser2g2gdgr4omf7edcexamvpf_jq...@mail.gmail.com I mis-remembered who was on which side of the debate, though, hence the comment about employers. But never mind about that, since I was wrong. Sorry for not checking that more carefully before. > This is not unlike the discussions we've had in the past around allowing > non-owners of a table to modify properties of a table, where the > argument has been successfully and clearly made that if you make the > ability to change the table a GRANT'able right, then that can be used to > become the role which owns the table without much difficulty, and so > there isn't really a point in making that right independently > GRANT'able. We have some of that explicitly GRANT'able today due to > requirements of the spec, but that's outside of our control. I don't think it's quite the same thing. I wouldn't go out of my way to invent grantable table rights that just let you usurp the table owner's permissions, but I still think it's better to have a uniform rule that functions we ship don't contain hard-coded superuser checks. One problem is that which functions allow an escalation to superuser that is easy enough or reliable enough may not actually be a very bright line in all cases. But more than that, I think it's a legitimate decision to grant to a user a right that COULD lead to a superuser escalation and trust them not to use that way, or prevent them from using it that way by some means not known to the database system (e.g. route all queries through pgbouncer and send them to a teletype; if a breach is detected, go to the teletype room, read the paper contained therein, and decide who to fire/imprison/execute at gunpoint). It shouldn't be our job to decide that granting a certain right is NEVER ok. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplify ACL handling for large objects and removal of superuser() checks
On Thu, Nov 9, 2017 at 1:16 PM, Stephen Frost wrote: > While we have been working to reduce the number of superuser() checks in > the backend in favor of having the ability to GRANT explicit rights, one > of the guideing principles has always been that capabilities which can > be used to gain superuser rights with little effort should remain > restricted to the superuser, which is why the lo_import/lo_export hadn't > been under consideration for superuser-check removal in the analysis I > provided previously. I disagree that that is, or should be, a guiding principle. I'm not sure that anyone other than you and your fellow employees at Crunchy has ever agreed that this is some kind of principle. You make it sound like there's a consensus about this, but I think there isn't. I think our guiding principle should be to get rid of ALL of the hard-coded superuser checks and let people GRANT what they want. If they grant a permission that results in somebody escalating to superuser, they get to keep both pieces. Such risks might be worth documenting, but we shouldn't obstruct people from doing it. In the same way, Linux will not prevent you from making a binary setuid regardless of what the binary does. If you make a binary setuid root that lets someone hack root, that's your fault, not the operating system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Thu, Nov 9, 2017 at 12:08 AM, Amit Kapila wrote: > This change looks suspicious to me. I think here we can't use the > tupDesc constructed from targetlist. One problem, I could see is that > the check for hasOid setting in tlist_matches_tupdesc won't give the > correct answer. In case of the scan, we use the tuple descriptor > stored in relation descriptor which will allow us to take the right > decision in tlist_matches_tupdesc. If you try the statement CREATE > TABLE as_select1 AS SELECT * FROM pg_class WHERE relkind = 'r'; in > force_parallel_mode=regress, then you can reproduce the problem I am > trying to highlight. I tried this, but nothing seemed to be obviously broken. Then I realized that the CREATE TABLE command wasn't using parallelism, so I retried with parallel_setup_cost = 0, parallel_tuple_cost = 0, and min_parallel_table_scan_size = 0. That got it to use parallel query, but I still don't see anything broken. Can you clarify further? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Sat, Oct 28, 2017 at 5:50 AM, Robert Haas wrote: > No, because the Append node is *NOT* getting copied into shared > memory. I have pushed a comment update to the existing functions; you > can use the same comment for this patch. I spent the last several days working on this patch, which had a number of problems both cosmetic and functional. I think the attached is in better shape now, but it could certainly use some more review and testing since I only just finished modifying it, and I modified it pretty heavily. Changes: - I fixed the "morerows" entries in the documentation. If you had built the documentation the way you had it and loaded up in a web browser, you would have seen that the way you had it was not correct. - I moved T_AppendState to a different position in the switch inside ExecParallelReInitializeDSM, so as to keep that switch in the same order as all of the other switch statements in that file. - I rewrote the comment for pa_finished. It previously began with "workers currently executing the subplan", which is not an accurate description. I suspect this was a holdover from a previous version of the patch in which this was an array of integers rather than an array of type bool. I also fixed the comment in ExecAppendEstimate and added, removed, or rewrote various other comments as well. - I renamed PA_INVALID_PLAN to INVALID_SUBPLAN_INDEX, which I think is more clear and allows for the possibility that this sentinel value might someday be used for non-parallel-aware Append plans. - I largely rewrote the code for picking the next subplan. A superficial problem with the way that you had it is that you had renamed exec_append_initialize_next to exec_append_seq_next but not updated the header comment to match. Also, the logic was spread out all over the file. There are three cases: not parallel aware, leader, worker. You had the code for the first case at the top of the file and the other two cases at the bottom of the file and used multiple "if" statements to pick the right one in each case. I replaced all that with a function pointer stored in the AppendState, moved the code so it's all together, and rewrote it in a way that I find easier to understand. I also changed the naming convention. - I renamed pappend_len to pstate_len and ParallelAppendDescData to ParallelAppendState. I think the use of the word "descriptor" is a carryover from the concept of a scan descriptor. There's nothing really wrong with inventing the concept of an "append descriptor", but it seems more clear to just refer to shared state. - I fixed ExecAppendReInitializeDSM not to reset node->as_whichplan. Per commit 41b0dd987d44089dc48e9c70024277e253b396b7, that's wrong; instead, local state should be reset in ExecReScanAppend. I installed what I believe to be the correct logic in that function instead. - I fixed list_qsort() so that it copies the type of the old list into the new list. Otherwise, sorting a list of type T_IntList or T_OidList would turn it into just plain T_List, which is wrong. - I removed get_append_num_workers and integrated the logic into the callers. This function was coded quite strangely: it assigned the return value of fls() to a double and then eventually rounded the result back to an integer. But fls() returns an integer, so this doesn't make much sense. On a related note, I made it use fls(# of subpaths) instead of fls(# of subpaths)+1. Adding 1 doesn't make sense to me here because it leads to a decision to use 2 workers for a single, non-partial subpath. I suspect both of these mistakes stem from thinking that fls() returns the base-2 logarithm, but in fact it doesn't, quite: log2(1) = 0.0 but fls(1) = 1. - In the process of making the changes described in the previous point, I added a couple of assertions, one of which promptly failed. It turns out the reason is that your patch didn't update accumulate_append_subpaths(), which can result in flattening non-partial paths from a Parallel Append into a parent Append's list of partial paths, which is bad. The easiest way to fix that would be to just teach accumulate_append_subpaths() not to flatten a Parallel Append into a parent Append or MergeAppend node, but it seemed to me that there was a fair amount of duplication between accumulate_partialappend_subpath() and accumulate_append_subpaths, so what I did instead is folded all of the necessarily logic directly into accumulate_append_subpaths(). This approach also avoids some assumptions that your code made, such as the assumption that we will never have a partial MergeAppend path. - I changed create_append_path() so that it uses the parallel_aware argument as the only determinant of whether the output path is flagged as parallel-aware. Your version also considered whether parallel_workers > 0, but I think that's not a good idea; the caller
Re: [HACKERS] pageinspect option to forgo buffer locking?
On Thu, Nov 9, 2017 at 12:58 PM, Andres Freund wrote: > You can already pass arbitrary byteas to heap_page_items(), so I don't > see how this'd change anything exposure-wise? Or are you thinking that > users would continually do this with actual page contents and would be > more likely to hit edge cases than if using pg_read_binary_file() or > such (which obviously sees an out of date page)? More the latter. It's not really an increase in terms of security exposure, but it might lead to more trouble in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect option to forgo buffer locking?
On Thu, Nov 9, 2017 at 12:49 PM, Andres Freund wrote: > Occasionally, when debugging issues, I find it quite useful to be able > to do a heap_page_items() on a page that's actually locked exclusively > concurrently. E.g. investigating the recent multixact vacuuming issues, > it was very useful to attach a debugger to one backend to step through > freezing, and display the page in another session. > > Currently the locking in get_raw_page_internal() prevents that. If it's > an option defaulting to off, I don't see why we couldn't allow that to > skip locking the page's contents. Obviously you can get corrupted > contents that way, but we already allow to pass arbitrary stuff to > heap_page_items(). Since pinning wouldn't be changed, there's no danger > of the page being moved out from under us. heap_page_items() is, if I remember correctly, not necessarily going to tolerate malformed input very well - I think that's why we restrict all of these functions to superusers. So using it in this way would seem to increase the risk of a server crash or other horrible misbehavior. Of course if we're just dev-testing that doesn't really matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Runtime Partition Pruning
On Thu, Nov 9, 2017 at 6:18 AM, Beena Emerson wrote: > The code still chooses the custom plan instead of the generic plan for > the prepared statements. I am working on it. I don't think it's really the job of this patch to do anything about that problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Wed, Nov 8, 2017 at 9:40 AM, Masahiko Sawada wrote: > Speaking of the acquiring these four lock types and heavy weight lock, > there obviously is a call path to acquire any of four lock types while > holding a heavy weight lock. In reverse, there also is a call path > that we acquire a heavy weight lock while holding any of four lock > types. The call path I found is that in heap_delete we acquire a tuple > lock and call XactLockTableWait or MultiXactIdWait which eventually > could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent > transactions finish. But IIUC since these functions acquire the lock > for the concurrent transaction's transaction id, deadlocks doesn't > happen. No, that's not right. Now that you mention it, I realize that tuple locks can definitely cause deadlocks. Example: setup: rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create table bar (a int, b text); CREATE TABLE rhaas=# insert into foo values (1, 'hoge'); INSERT 0 1 session 1: rhaas=# begin; BEGIN rhaas=# update foo set b = 'hogehoge' where a = 1; UPDATE 1 session 2: rhaas=# begin; BEGIN rhaas=# update foo set b = 'quux' where a = 1; session 3: rhaas=# begin; BEGIN rhaas=# lock bar; LOCK TABLE rhaas=# update foo set b = 'blarfle' where a = 1; back to session 1: rhaas=# select * from bar; ERROR: deadlock detected LINE 1: select * from bar; ^ DETAIL: Process 88868 waits for AccessShareLock on relation 16391 of database 16384; blocked by process 88845. Process 88845 waits for ExclusiveLock on tuple (0,1) of relation 16385 of database 16384; blocked by process 88840. Process 88840 waits for ShareLock on transaction 1193; blocked by process 88868. HINT: See server log for query details. So what I said before was wrong: we definitely cannot exclude tuple locks from deadlock detection. However, we might be able to handle the problem in another way: introduce a separate, parallel-query specific mechanism to avoid having two participants try to update and/or delete the same tuple at the same time - e.g. advertise the BufferTag + offset within the page in DSM, and if somebody else already has that same combination advertised, wait until they no longer do. That shouldn't ever deadlock, because the other worker shouldn't be able to find itself waiting for us while it's busy updating a tuple. After some further study, speculative insertion locks look problematic too. I'm worried about the code path ExecInsert() [taking speculative insertion locking] -> heap_insert -> heap_prepare_insert -> toast_insert_or_update -> toast_save_datum -> heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock). That sure looks like we can end up waiting for a relation lock while holding a speculative insertion lock, which seems to mean that speculative insertion locks are subject to at least theoretical deadlock hazards as well. Note that even if we were guaranteed to be holding the lock on the toast relation already at this point, it wouldn't fix the problem, because we might still have to build or refresh a relcache entry at this point, which could end up scanning (and thus locking) system catalogs. Any syscache lookup can theoretically take a lock, even though most of the time it doesn't, and thus taking a lock that has been removed from the deadlock detector (or, say, an lwlock) and then performing a syscache lookup with it held is not OK. So I don't think we can remove speculative insertion locks from the deadlock detector either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
On Wed, Nov 8, 2017 at 10:33 AM, Tom Lane wrote: > I do not think there is any change here that can be proven to always be a > win. Certainly the original patch, which proposes to replace an O(n log n) > sort algorithm with an O(n^2) one, should not be thought to be that. > The question to focus on is what's the average case, and I'm not sure how > to decide what the average case is. But more than two test scenarios > would be a good start. I appreciate the difficulties here; I'm just urging caution. Let's not change things just to clear this patch off our plate. Just to throw a random idea out here, we currently have gen_qsort_tuple.pl producing qsort_tuple() and qsort_ssup(). Maybe it could be modified to also produce a specialized qsort_itemids(). That might be noticeably faster that our general-purpose qsort() for the reasons mentioned in the comments in gen_qsort_tuple.pl, viz: # The major effects are (1) inlining simple tuple comparators is much faster # than jumping through a function pointer and (2) swap and vecswap operations # specialized to the particular data type of interest (in this case, SortTuple) # are faster than the generic routines. I don't remember any more just how much faster qsort_tuple() and qsort_ssup() are than plain qsort(), but it was significant enough to convince me to commit 337b6f5ecf05b21b5e997986884d097d60e4e3d0... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila wrote: > We do want to generate it later when there isn't inheritance involved, > but only if there is a single rel involved (simple_rel_array_size > <=2). The rule is something like this, we will generate the gather > paths at this stage only if there are more than two rels involved and > there isn't inheritance involved. Why is that the correct rule? Sorry if I'm being dense here. I would have thought we'd want to skip it for the topmost scan/join rel regardless of the presence or absence of inheritance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to compactify_tuples
On Tue, Nov 7, 2017 at 4:39 PM, Tom Lane wrote: > What I'm getting from the standard pgbench measurements, on both machines, > is that this patch might be a couple percent slower than HEAD, but that is > barely above the noise floor so I'm not too sure about it. Hmm. It seems like slowing down single client performance by a couple of percent is something that we really don't want to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila wrote: > This is required to prohibit generating gather path for top rel in > case of inheritence (Append node) at this place (we want to generate > it later when scan/join target is available). OK, but why don't we want to generate it later when there isn't inheritance involved? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
On Tue, Nov 7, 2017 at 4:35 AM, Haribabu Kommi wrote: > The newly added option is not recommended to be used in normal cases and > it is used only for upgrade utilities. I don't know why it couldn't be used in normal cases. That seems like a totally legitimate thing for somebody to want. Maybe nobody does, but I see no reason to worry if they do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_wal_write statistics view
On Tue, Nov 7, 2017 at 4:31 AM, Haribabu Kommi wrote: >> Updated patch attached. > Patch rebased. I think the earlier concerns about the performance impact of this are probably very valid concerns, and I don't see how the new version of the patch gets us much closer to solving them. I am also not sure I understand how the backend_write_blocks column is intended to work. The only call to pgstat_send_walwrites() is in WalWriterMain, so where do the other backends report anything? Also, if there's only ever one global set of counters (as opposed to one per table, say) then why use the stats collector machinery for this at all, vs. having a structure in shared memory that can be updated directly? It seems like adding a lot of overhead for no functional benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix a typo in dsm_impl.c
On Mon, Nov 6, 2017 at 11:22 PM, Masahiko Sawada wrote: > Attached the patch for $subject. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Hash take II
On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund wrote: > + ResourceOwnerEnlargeFiles(CurrentResourceOwner); > + ResourceOwnerRememberFile(CurrentResourceOwner, file); > + VfdCache[file].resowner = CurrentResourceOwner; > > So maybe I'm being pedantic here, but wouldn't the right order be to do > ResourceOwnerEnlargeFiles() *before* creating the file? It's a memory > allocating operation, so it can fail, which'd leak the file. That's not pedantic ... that's a very sound criticism. IMHO, anyway. > diff --git a/src/backend/utils/resowner/resowner.c > b/src/backend/utils/resowner/resowner.c > index 4c35ccf65eb..8b91d5a6ebe 100644 > --- a/src/backend/utils/resowner/resowner.c > +++ b/src/backend/utils/resowner/resowner.c > @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, > PrintRelCacheLeakWarning(res); > RelationClose(res); > } > - > - /* Ditto for dynamic shared memory segments */ > - while (ResourceArrayGetAny(&(owner->dsmarr), &foundres)) > - { > - dsm_segment *res = (dsm_segment *) > DatumGetPointer(foundres); > - > - if (isCommit) > - PrintDSMLeakWarning(res); > - dsm_detach(res); > - } > } > else if (phase == RESOURCE_RELEASE_LOCKS) > { > @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner, > PrintFileLeakWarning(res); > FileClose(res); > } > + > + /* Ditto for dynamic shared memory segments */ > + while (ResourceArrayGetAny(&(owner->dsmarr), &foundres)) > + { > + dsm_segment *res = (dsm_segment *) > DatumGetPointer(foundres); > + > + if (isCommit) > + PrintDSMLeakWarning(res); > + dsm_detach(res); > + } > } > > Is that entirely unproblematic? Are there any DSM callbacks that rely on > locks still being held? Please split this part into a separate commit > with such analysis. FWIW, I think this change is a really good idea (I recommended it to Thomas at some stage, I think). The current positioning was decided by me at a very early stage of parallel query development where I reasoned as follows (1) the first thing we're going to implement is going to be parallel quicksort, (2) that's going to allocate a huge amount of DSM, (3) therefore we should try to free it as early as possible. However, I now thing that was wrongheaded, and not just because parallel quicksort didn't turn out to be the first thing we developed. Memory is the very last resource we should release when aborting a transaction, because any other resource we have is tracked using data structures that are stored in memory. Throwing the memory away before the end therefore makes life very difficult. That's why, for backend-private memory, we clean up most everything else in AbortTransaction() and then only destroy memory contexts in CleanupTransaction(). This change doesn't go as far, but it's in the same general direction, and I think rightly so. My error was in thinking that the primary use of memory would be for storing data, but really it's about where you put your control structures. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila wrote: >> Well, I suppose that test will fire for a baserel when the total >> number of baserels is at least 3 and there's no inheritance involved. >> But if there are 2 baserels, we're still not the topmost scan/join >> target. > > No, if there are 2 baserels, then simple_rel_array_size will be 3. > The simple_rel_array_size is always the "number of relations" plus > "one". See setup_simple_rel_arrays. Oh, wow. That's surprising. >> Also, even if inheritance is used, we might still be the >> topmost scan/join target. > > Sure, but in that case, it won't generate the gather path here (due to > this part of check "!root->append_rel_list"). I am not sure whether I > have understood the second part of your question, so if my answer > appears inadequate, then can you provide more details on what you are > concerned about? I don't know why the question of why root->append_rel_list is empty is the relevant thing here for deciding whether to generate gather paths at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
On Mon, Nov 6, 2017 at 5:19 AM, Ashutosh Bapat wrote: > IIRC, only thing that changes between plan time quals and execution > time quals is constaint folding of constant parameters. But I don't > think we change the selectivity estimates when that's done. At the > same time, I don't think we should make a lot of effort to make sure > that the order used during the estimation is same as the order at the > execution; we are anyway estimating. There can always be some > difference between what's estimated and what actually happens. I don't think that's a good justification for allowing the order to vary. It's true that, at execution time, we may find row counts and selectivities to be different than what we predicted, but that is a case of the real data being different from our idealized data -- which is difficult to avoid in general. Allowing our actual decisions to be different from the decisions we thought we would make seems like programmer sloppiness. It would also be very confusing if the planner uses one ordering to estimate the cost and then a different order at execution time and in the EXPLAIN ANALYZE output. How could anybody understand what had happened in such a case? I think it would be a good idea, as Thomas says, to order the qual clauses at an earlier stage and then remember our decision. However, we have to think about whether that's going to increase planning time in a noticeable way. I wonder why we currently postpone this until such a late phase of planning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada wrote: >>> I suggest that a good thing to do more or less immediately, regardless >>> of when this patch ends up being ready, would be to insert an >>> insertion that LockAcquire() is never called while holding a lock of >>> one of these types. If that assertion ever fails, then the whole >>> theory that these lock types don't need deadlock detection is wrong, >>> and we'd like to find out about that sooner or later. >> >> I understood. I'll check that first. > > I've checked whether LockAcquire is called while holding a lock of one > of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE, > LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that > we cannot move these four lock types together out of heavy-weight > lock, but can move only the relation extension lock with tricks. > > Here is detail of the survey. Thanks for these details, but I'm not sure I fully understand. > * LOCKTAG_RELATION_EXTENSION > There is a path that LockRelationForExtension() could be called while > holding another relation extension lock. In brin_getinsertbuffer(), we > acquire a relation extension lock for a index relation and could > initialize a new buffer (brin_initailize_empty_new_buffer()). During > initializing a new buffer, we call RecordPageWithFreeSpace() which > eventually can call fsm_readbuf(rel, addr, true) where the third > argument is "extend". We can process this problem by having the list > (or local hash) of acquired locks and skip acquiring the lock if > already had. For other call paths calling LockRelationForExtension, I > don't see any problem. Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock? Basically, what matters here in the end is whether we can articulate a deadlock-proof rule around the order in which these locks are acquired. The simplest such rule would be "you can only acquire one lock of any of these types at a time, and you can't subsequently acquire a heavyweight lock". But a more complicated rule would be OK too, e.g. "you can acquire as many heavyweight locks as you want, and after that you can optionally acquire one page, tuple, or speculative token lock, and after that you can acquire a relation extension lock". The latter rule, although more complex, is still deadlock-proof, because the heavyweight locks still use the deadlock detector, and the rest has a consistent order of lock acquisition that precludes one backend taking A then B while another backend takes B then A. I'm not entirely clear whether your survey leads us to a place where we can articulate such a deadlock-proof rule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On Wed, Oct 25, 2017 at 11:40 AM, Amit Khandekar wrote: > Below I have addressed the remaining review comments : The changes to trigger.c still make me super-nervous. Hey THOMAS MUNRO, any chance you could review that part? + /* The caller must have already locked all the partitioned tables. */ + root_rel = heap_open(root_relid, NoLock); + *all_part_cols = NULL; + foreach(lc, partitioned_rels) + { + Index rti = lfirst_int(lc); + Oid relid = getrelid(rti, rtables); + Relationpart_rel = heap_open(relid, NoLock); + + pull_child_partition_columns(part_rel, root_rel, all_part_cols); + heap_close(part_rel, NoLock); I don't like the fact that we're opening and closing the relation here just to get information on the partitioning columns. I think it would be better to do this someplace that already has the relation open and store the details in the RelOptInfo. set_relation_partition_info() looks like the right spot. +void +pull_child_partition_columns(Relation rel, +Relation parent, +Bitmapset **partcols) This code has a lot in common with is_partition_attr(). I'm not sure it's worth trying to unify them, but it could be done. + * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT, Instead of " : ", you could just write "is the". +* For Updates, if the leaf partition is already present in the +* per-subplan result rels, we re-use that rather than initialize a +* new result rel. The per-subplan resultrels and the resultrels of +* the leaf partitions are both in the same canonical order. So while It would be good to explain the reason. Also, Updates shouldn't be capitalized here. + Assert(cur_update_rri <= update_rri + num_update_rri - 1); Maybe just cur_update_rri < update_rri + num_update_rri, or even current_update_rri - update_rri < num_update_rri. Also, +1 for Amit Langote's idea of trying to merge mt_perleaf_childparent_maps with mt_persubplan_childparent_maps. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Client Connection redirection support for PostgreSQL
On Thu, Nov 2, 2017 at 4:33 PM, Craig Ringer wrote: >> Add the ability to the PostgreSQL server instance to route the traffic to a >> different server instance based on the rules defined in server’s pg_bha.conf >> configuration file. At a high level this enables offloading the user >> requests to a different server instance based on the rules defined in the >> pg_hba.conf configuration file. > > pg_hba.conf is "host based access [control]" . I'm not sure it's > really the right place. Well, we could invent someplace else, but I'm not sure I quite see the point (full disclosure: I suggested the idea of doing this via pg_hba.conf in an off-list discussion). I do think the functionality is useful, for the same reasons that HTTP redirects are useful. For example, let's say you have all of your databases for various clients on a single instance. Then, one client starts using a lot more resources, so you want to move that client to a separate instance on another VM. You can set up logical replication to replicate all of the data to the new instance, and then add a pg_hba.conf entry to redirect connections to that database to the new master (this would be even smoother if we had multi-master replication in core). So now that client is moved off to another machine in a completely client-transparent way. I think that's pretty cool. > When this has come up before, one of the issues has been determining > what exactly should constitute "read only" vs "read write" for the > purposes of redirecting work. Yes, that needs some thought. > Backends used just for a redirect would be pretty expensive though. Not as expensive as proxying the whole connection, as pgpool and other systems do today. I think the in-core use of this redirect functionality is useful, but I think the real win would be optionally using it in pgpool and pgbouncer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila wrote: > On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas wrote: >> This looks like it's on the right track to me. I hope Tom will look >> into it, but if he doesn't I may try to get it committed myself. >> >> -if (rel->reloptkind == RELOPT_BASEREL) >> -generate_gather_paths(root, rel); >> +if (rel->reloptkind == RELOPT_BASEREL && >> +root->simple_rel_array_size > 2 && >> +!root->append_rel_list) >> >> This test doesn't look correct to me. Actually, it doesn't look >> anywhere close to correct to me. So, one of us is very confused... >> not sure whether it's you or me. >> > It is quite possible that I haven't got it right, but it shouldn't be > completely bogus as it stands the regression tests and some manual > verification. Can you explain what is your concern about this test? Well, I suppose that test will fire for a baserel when the total number of baserels is at least 3 and there's no inheritance involved. But if there are 2 baserels, we're still not the topmost scan/join target. Also, even if inheritance is used, we might still be the topmost scan/join target. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Early locking option to parallel backup
On Mon, Nov 6, 2017 at 4:43 AM, Tom Lane wrote: > I wonder if we couldn't somehow repurpose the work that was done for > parallel workers' locks. Lots of security-type issues to be handled > if we're to open that up to clients, but maybe it's solvable. For > instance, maybe only allowing it to clients sharing the same snapshot > would help. Interesting idea. There's a bunch of holes that would need to be patched there; for instance, you can't have one session running DDL while somebody else has AccessShareLock. Parallel query relies on the parallel-mode restrictions to prevent that kind of thing from happening, but it would be strange (and likely somewhat broken) to try to enforce those here. It would be strange and probably bad if LOCK TABLE a; LOCK TABLE b in one session and LOCK TABLE b; LOCK TABLE a in another session failed to deadlock. In short, there's a big difference between a single session using multiple processes and multiple closely coordinated sessions. Also, even if you did it, you still need a lot of PROCLOCKs. Workers don't need to take all locks up front because they can be assured of getting them later, but they've still got to lock the objects they actually want to access. Group locking aims to prevent deadlocks between cooperating processes; it is not a license to skip locking altogether. None of which is to say that the problems don't feel related somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Early locking option to parallel backup
On Sun, Nov 5, 2017 at 5:17 AM, Lucas wrote: > The patch creates a "--lock-early" option which will make pg_dump to issue > shared locks on all tables on the backup TOC on each parallel worker start. > That way, the backup has a very small chance of failing. When it does, > happen in the first few seconds of the backup job. My backup scripts (not > included here) are aware of that and retries the backup in case of failure. I wonder why we don't do this already ... and by default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Custom compression methods
On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov wrote: >> IIRC there were some concerns about what happened with pg_upgrade, >> with consuming precious toast bits, and a few other things. > > yes, pg_upgrade may be a problem. A basic problem here is that, as proposed, DROP COMPRESSION METHOD may break your database irretrievably. If there's no data compressed using the compression method you dropped, everything is cool - otherwise everything is broken and there's no way to recover. The only obvious alternative is to disallow DROP altogether (or make it not really DROP). Both of those alternatives sound fairly unpleasant to me, but I'm not exactly sure what to recommend in terms of how to make it better. Ideally anything we expose as an SQL command should have a DROP command that undoes whatever CREATE did and leaves the database in an intact state, but that seems hard to achieve in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila wrote: > Thanks for the confirmation. Find rebased patch attached. This looks like it's on the right track to me. I hope Tom will look into it, but if he doesn't I may try to get it committed myself. -if (rel->reloptkind == RELOPT_BASEREL) -generate_gather_paths(root, rel); +if (rel->reloptkind == RELOPT_BASEREL && +root->simple_rel_array_size > 2 && +!root->append_rel_list) This test doesn't look correct to me. Actually, it doesn't look anywhere close to correct to me. So, one of us is very confused... not sure whether it's you or me. simple_gather_path = (Path *) create_gather_path(root, rel, cheapest_partial_path, rel->reltarget, NULL, NULL); + +/* Add projection step if needed */ +if (target && simple_gather_path->pathtarget != target) +simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target); Instead of using apply_projection_to_path, why not pass the correct reltarget to create_gather_path? +/* Set or update cheapest_total_path and related fields */ +set_cheapest(current_rel); I wonder if it's really OK to call set_cheapest() a second time for the same relation... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Fri, Nov 3, 2017 at 5:57 PM, Alexander Korotkov wrote: > On Thu, Nov 2, 2017 at 5:56 AM, Robert Haas wrote: >> > I can propose following alternative approach: teach read-only queries on >> > hot >> > standby to tolerate concurrent relation truncation. Therefore, when >> > non-existent heap page is accessed on hot standby, we can know that it >> > was >> > deleted by concurrent truncation and should be assumed to be empty. Any >> > thoughts? >> >> Sounds like it might break MVCC? > > I don't know why it might be broken. VACUUM truncates heap only when tail > to be truncated is already empty. When applying truncate WAL record, > previous WAL records deleting all those tuples in the tail are already > applied. Thus, if even MVCC is broken and we will miss some tuples after > heap truncation, they were anyway were gone before heap truncation. Ah - I was thinking of the TRUNCATE command, rather than truncation by VACUUM. Your argument makes sense, although the case where the relation is truncated and later re-extended might need some thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Sun, Nov 5, 2017 at 2:24 AM, Andres Freund wrote: >> shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only >> consume input from the shared queue when the amount of unconsumed >> input exceeds 1/4 of the queue size. This caused a large performance >> improvement in my testing because it causes the number of times the >> latch gets set to drop dramatically. I experimented a bit with >> thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be >> enough to capture most of the benefit. > > Hm. Is consuming the relevant part, or notifying the sender about it? I > suspect most of the benefit can be captured by updating bytes read (and > similarly on the other side w/ bytes written), but not setting the latch > unless thresholds are reached. The advantage of updating the value, > even without notifying the other side, is that in the common case that > the other side gets around to checking the queue without having blocked, > it'll see the updated value. If that works, that'd address the issue > that we might wait unnecessarily in a number of common cases. I think it's mostly notifying the sender. Sending SIGUSR1 over and over again isn't free, and it shows up in profiling. I thought about what you're proposing here, but it seemed more complicated to implement, and I'm not sure that there would be any benefit. The reason is because, with these patches applied, even a radical expansion of the queue size doesn't produce much incremental performance benefit at least in the test case I was using. I can increase the size of the tuple queues 10x or 100x and it really doesn't help very much. And consuming sooner (but sometimes without notifying) seems very similar to making the queue slightly bigger. Also, what I see in general is that the CPU usage on the leader goes to 100% but the workers are only maybe 20% saturated. Making the leader work any harder than absolutely necessarily therefore seems like it's probably counterproductive. I may be wrong, but it looks to me like most of the remaining overhead seems to come from (1) the synchronization overhead associated with memory barriers and (2) backend-private work that isn't as cheap as would be ideal - e.g. palloc overhead. > Interesting. Here it's > +8.79% postgres postgres[.] ExecAgg > +6.52% postgres postgres[.] slot_deform_tuple > +5.65% postgres postgres[.] slot_getattr > +4.59% postgres postgres[.] shm_mq_send_bytes > +3.66% postgres postgres[.] ExecInterpExpr > +3.44% postgres postgres[.] AllocSetAlloc > +3.08% postgres postgres[.] heap_fill_tuple > +2.34% postgres postgres[.] heap_getnext > +2.25% postgres postgres[.] finalize_aggregates > +2.08% postgres libc-2.24.so[.] __memmove_avx_unaligned_erms > +2.05% postgres postgres[.] heap_compare_slots > +1.99% postgres postgres[.] execTuplesMatch > +1.83% postgres postgres[.] ExecStoreTuple > +1.83% postgres postgres[.] shm_mq_receive > +1.81% postgres postgres[.] ExecScan More or less the same functions, somewhat different order. >> I'm probably not super-excited about spending too much more time >> trying to make the _platform_memmove time (only 20% or so of which >> seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time >> go down until, say, somebody JIT's slot_getattr and slot_deform_tuple. >> :-) > > Hm, let's say somebody were working on something like that. In that case > the benefits for this precise plan wouldn't yet be that big because a > good chunk of slot_getattr calls come from execTuplesMatch() which > doesn't really provide enough context to do JITing (when used for > hashaggs, there is more so it's JITed). Similarly gather merge's > heap_compare_slots() doesn't provide such context. > > It's about ~9% currently, largely due to the faster aggregate > invocation. But the big benefit here would be all the deforming and the > comparisons... I'm not sure I follow you here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Sat, Nov 4, 2017 at 5:55 PM, Andres Freund wrote: >> master: 21436.745, 20978.355, 19918.617 >> patch: 15896.573, 15880.652, 15967.176 >> >> Median-to-median, that's about a 24% improvement. > > Neat! With the attached stack of 4 patches, I get: 10811.768 ms, 10743.424 ms, 10632.006 ms, about a 49% improvement median-to-median. Haven't tried it on hydra or any other test cases yet. skip-gather-project-v1.patch does what it says on the tin. I still don't have a test case for this, and I didn't find that it helped very much, but it would probably help more in a test case with more columns, and you said this looked like a big bottleneck in your testing, so here you go. shm-mq-less-spinlocks-v2.patch is updated from the version I posted before based on your review comments. I don't think it's really necessary to mention that the 8-byte atomics have fallbacks here; whatever needs to be said about that should be said in some central place that talks about atomics, not in each user individually. I agree that there might be some further speedups possible by caching some things in local memory, but I haven't experimented with that. shm-mq-reduce-receiver-latch-set-v1.patch causes the receiver to only consume input from the shared queue when the amount of unconsumed input exceeds 1/4 of the queue size. This caused a large performance improvement in my testing because it causes the number of times the latch gets set to drop dramatically. I experimented a bit with thresholds of 1/8 and 1/2 before setting on 1/4; 1/4 seems to be enough to capture most of the benefit. remove-memory-leak-protection-v1.patch removes the memory leak protection that Tom installed upon discovering that the original version of tqueue.c leaked memory like crazy. I think that it shouldn't do that any more, courtesy of 6b65a7fe62e129d5c2b85cd74d6a91d8f7564608. Assuming that's correct, we can avoid a whole lot of tuple copying in Gather Merge and a much more modest amount of overhead in Gather. Since my test case exercised Gather Merge, this bought ~400 ms or so. Even with all of these patches applied, there's clearly still room for more optimization, but MacOS's "sample" profiler seems to show that the bottlenecks are largely shifting elsewhere: Sort by top of stack, same collapsed (when >= 5): slot_getattr (in postgres)706 slot_deform_tuple (in postgres)560 ExecAgg (in postgres)378 ExecInterpExpr (in postgres)372 AllocSetAlloc (in postgres)319 _platform_memmove$VARIANT$Haswell (in libsystem_platform.dylib)314 read (in libsystem_kernel.dylib)303 heap_compare_slots (in postgres)296 combine_aggregates (in postgres)273 shm_mq_receive_bytes (in postgres)272 I'm probably not super-excited about spending too much more time trying to make the _platform_memmove time (only 20% or so of which seems to be due to the shm_mq stuff) or the shm_mq_receive_bytes time go down until, say, somebody JIT's slot_getattr and slot_deform_tuple. :-) One thing that might be worth doing is hammering on the AllocSetAlloc time. I think that's largely caused by allocating space for heap tuples and then freeing them and allocating space for new heap tuples. Gather/Gather Merge are guilty of that, but I think there may be other places in the executor with the same issue. Maybe we could have fixed-size buffers for small tuples that just get reused and only palloc for large tuples (cf. SLAB_SLOT_SIZE). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company skip-gather-project-v1.patch Description: Binary data shm-mq-less-spinlocks-v2.patch Description: Binary data shm-mq-reduce-receiver-latch-set-v1.patch Description: Binary data remove-memory-leak-protection-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Wed, Oct 18, 2017 at 3:09 AM, Andres Freund wrote: > 2) The spinlocks both on the the sending and receiving side a quite hot: > >rafia query leader: > + 36.16% postgres postgres[.] shm_mq_receive > + 19.49% postgres postgres[.] s_lock > + 13.24% postgres postgres[.] SetLatch Here's a patch which, as per an off-list discussion between Andres, Amit, and myself, removes the use of the spinlock for most send/receive operations in favor of memory barriers and the atomics support for 8-byte reads and writes. I tested with a pgbench -i -s 300 database with pgbench_accounts_pkey dropped and max_parallel_workers_per_gather boosted to 10. I used this query: select aid, count(*) from pgbench_accounts group by 1 having count(*) > 1; which produces this plan: Finalize GroupAggregate (cost=1235865.51..5569468.75 rows=1000 width=12) Group Key: aid Filter: (count(*) > 1) -> Gather Merge (cost=1235865.51..4969468.75 rows=3000 width=12) Workers Planned: 6 -> Partial GroupAggregate (cost=1234865.42..1322365.42 rows=500 width=12) Group Key: aid -> Sort (cost=1234865.42..1247365.42 rows=500 width=4) Sort Key: aid -> Parallel Seq Scan on pgbench_accounts (cost=0.00..541804.00 rows=500 width=4) (10 rows) On hydra (PPC), these changes didn't help much. Timings: master: 29605.582, 29753.417, 30160.485 patch: 28218.396, 27986.951, 26465.584 That's about a 5-6% improvement. On my MacBook, though, the improvement was quite a bit more: master: 21436.745, 20978.355, 19918.617 patch: 15896.573, 15880.652, 15967.176 Median-to-median, that's about a 24% improvement. Any reviews appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company shm-mq-less-spinlocks-v1.2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Nov 3, 2017 at 1:05 PM, Simon Riggs wrote: >> Therefore, if MERGE eventually uses INSERT .. ON CONFLICT >> UPDATE when a relevant unique index exists and does something else, >> such as your proposal of taking a strong lock, or Peter's proposal of >> doing this in a concurrency-oblivious manner, in other cases, then >> those two cases will behave very differently. > > The *only* behavioural difference I have proposed would be the *lack* > of an ERROR in (some) concurrent cases. I think that's a big difference. Error vs. non-error is a big deal by itself; also, the non-error case involves departing from MVCC semantics just as INSERT .. ON CONFLICT UPDATE does. > All I have at the moment is that a few people disagree, but that > doesn't help determine the next action. > > We seem to have a few options for PG11 > > 1. Do nothing, we reject MERGE > > 2. Implement MERGE for unique index situations only, attempting to > avoid errors (Simon OP) > > 3. Implement MERGE, but without attempting to avoid concurrent ERRORs (Peter) > > 4. Implement MERGE, while attempting to avoid concurrent ERRORs in > cases where that is possible. > > Stephen, Robert, please say which option you now believe we should pick. I think Peter has made a good case for #3, so I lean toward that option. I think #4 is too much of a non-obvious behavior difference between the cases where we can avoid those errors and the cases where we can't, and I don't see where #2 can go in the future other than #4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to implement a SP-GiST index as a extension module?
On Thu, Nov 2, 2017 at 9:53 AM, Connor Wolf wrote: > As such: > Will compound queries as I describe above basically require a custom type to > make it possible? My (admittedly naive) expectation > is that the eventual query for this index will look something like "SELECT * > FROM example_table WHERE indexed_column <=> target_value < 4;", > with "<=>" being the operator for the relevant distance calculation > (hamming, for the BK tree, numeric for the VP-tree). > > The existing VP-tree code appears to not support multiple operators > whatsoever, probably because it was very preliminary. I'm not an expert in this area in any way whatsoever; I don't know a VP-tree from a BK-tree from a maple tree. However, I can tell you that as a general rule, PostgreSQL index access methods can only apply index quals of the form "WHERE column op value" or ordering criteria of the form "ORDER BY column op value". So, in the above example, you might think about trying to set up the access method so that it can efficiently return values ordered by indexed_column <=> target_value and then wrapping the ORDER BY query in a subselect to cut off fetching values at the correct point. But no operator class for any access method can directly handle that query efficiently as you've written it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On Thu, Nov 2, 2017 at 9:05 PM, Nico Williams wrote: >> Overloading SET to handle both variables and GUCs seems likely to >> create problems, possibly including security problems. For example, >> maybe a security-definer function could leave behind variables to >> trick the calling code into failing to set GUCs that it intended to >> set. Or maybe creating a variable at the wrong time will just break >> things randomly. > > That's already true of GUCs, since there are no access controls on > set_config()/current_setting(). No, it isn't. Right now, SET always refers to a GUC, never a variable, so there's no possibility of getting confused about whether it's intending to change a GUC or an eponymous variable. Once you make SET able to change either one of two different kinds of objects, then that possibility does exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan wrote: > On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas wrote: >> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where >> I think things get a lot more dangerous. The problem (as Andres >> pointed out to me this afternoon) is that it seems possible to end up >> with a situation where there should be two HOT chains on a page, and >> because of the weakened xmin/xmax checking rules, we end up thinking >> that the second one is a continuation of the first one, which will be >> all kinds of bad news. That would be particularly likely to happen in >> releases from before we invented HEAP_XMIN_FROZEN, when there's no >> xmin/xmax matching at all, but could happen on later releases if we >> are extraordinarily unlucky (i.e. xmin of the first tuple in the >> second chain happens to be the same as the pre-freeze xmax in the old >> chain, probably because the same XID was used to update the page in >> two consecutive epochs). Fortunately, that commit is (I think) not >> released anywhere. > > FWIW, if you look at the second commit > (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize > that it doesn't even treat those two cases differently. It was buggy > even on its own terms. The FrozenTransactionId test used an xmin from > HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin(). Oh, wow. You seem to be correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Oct 31, 2017 at 5:14 PM, Simon Riggs wrote: > I've proposed a SQL Standard compliant implementation that would do > much more than be new syntax over what we already have. > > So these two claims aren't accurate: "radical difference" and "syntax > sugar over a capability we have". I think those claims are pretty accurate. The design of INSERT .. ON CONFLICT UPDATE and the supporting machinery only work if there is a unique index. It provides radically different behavior than what you get with any other DML statement, with completely novel concurrency behavior, and there is no reasonable way to emulate that behavior in cases where no relevant unique index exists. Therefore, if MERGE eventually uses INSERT .. ON CONFLICT UPDATE when a relevant unique index exists and does something else, such as your proposal of taking a strong lock, or Peter's proposal of doing this in a concurrency-oblivious manner, in other cases, then those two cases will behave very differently. And if, in the meantime, MERGE can only handle the cases where there is a unique index, then it can only handle the cases INSERT .. ON CONFLICT UPDATE can cover, which makes it, as far as I can see, syntactic sugar over what we already have. Maybe it's not entirely - you might be planning to make some minor functional enhancements - but it's not clear what those are, and I feel like whatever it is could be done with less work and more elegance by just extending the INSERT .. ON CONFLICT UPDATE syntax. And it does seem to be your intention to only handle the cases which the INSERT .. ON CONFLICT UPDATE infrastructure can cover, because upthread you wrote this: "I didn't say it but my intention was to just throw an ERROR if no single unique index can be identified." I don't think anybody's putting words into your mouth here. We're just reading what you wrote. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Thu, Nov 2, 2017 at 8:28 AM, Craig Ringer wrote: > On 1 November 2017 at 01:55, Peter Geoghegan wrote: >> The problem here is: Iff the first statement uses ON CONFLICT >> infrastructure, doesn't the absence of WHEN NOT MATCHED imply >> different semantics for the remaining updates and deletes in the >> second version of the query? You've removed what seems like a neat >> adjunct to the MERGE, but it actually changes everything else too when >> using READ COMMITTED. > > Would these concerns be alleviated by adding some kind of Pg-specific > decoration that constrained concurrency-safe MERGEs? > > So your first statement would be > > MERGE CONCURRENTLY ... > > and when you removed the WHEN NOT MATCHED clause it'd ERROR because > that's no longer able to be done with the same concurrency-safe > semantics? > > I don't know if this would be helpful TBH, or if it would negate > Simon's compatibility goals. Just another idea. Yes, that fixes the problem. Of course, it also turns MERGE CONCURRENTLY into syntactic sugar for INSERT ON CONFLICT UPDATE, which brings one back to the question of exactly what we're trying to achieve here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera wrote: > Pushed the reverts. > > I noticed while doing so that REL_10_STABLE contains the bogus commits. > Does that change our opinion regarding what to do for people upgrading > to a version containing the broken commits? I don't think so, because > > 1) we hope that not many people will trust their data to 10.0 > immediately after release > 2) the bug is very low probability > 3) it doesn't look like we can do a lot about it anyway. Just to be clear, it looks like "Fix freezing of a dead HOT-updated tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0 was stamped, but "Fix traversal of half-frozen update chains" (22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is therefore unreleased at present. Users of 10.0 who hit the code introduced by 46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the xmax fields of tuples that predate relfrozenxid. Those tuples will be hinted-committed. That's not good, but it might not really have much in the way of consequences. *IF* the next VACUUM doesn't get confused by the old XID, then it will prune the tuple then and I think we'll be OK. And I think it won't, because it should just call HeapTupleSatisfiesVacuum() and that should see that HEAP_XMAX_COMMITTED is set and not actually try to consult the old CLOG. If that hint bit can ever get lost - or fail to propagate to a standby - then we have more trouble, but the fact that it's set by a logged operation makes me hope that can't happen. Furthermore, that follow-on VACUUM should indeed arrive in due time, because we will not have marked the page all-visible -- HeapTupleSatisfiesVacuum() will NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(), and therefore we will have set all_visible = false. The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where I think things get a lot more dangerous. The problem (as Andres pointed out to me this afternoon) is that it seems possible to end up with a situation where there should be two HOT chains on a page, and because of the weakened xmin/xmax checking rules, we end up thinking that the second one is a continuation of the first one, which will be all kinds of bad news. That would be particularly likely to happen in releases from before we invented HEAP_XMIN_FROZEN, when there's no xmin/xmax matching at all, but could happen on later releases if we are extraordinarily unlucky (i.e. xmin of the first tuple in the second chain happens to be the same as the pre-freeze xmax in the old chain, probably because the same XID was used to update the page in two consecutive epochs). Fortunately, that commit is (I think) not released anywhere. Personally, I think it would be best to push the release out a week. I think we understand this well enough now that we can fix it relatively easily, but haste makes bugs, and (I know you're all tired of hearing me say this) patches that implicate the on-disk format are scary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: schema variables
On Thu, Oct 26, 2017 at 12:51 PM, Pavel Stehule wrote: > The variables can be modified by SQL command SET (this is taken from > standard, and it natural) > > SET varname = expression; Overloading SET to handle both variables and GUCs seems likely to create problems, possibly including security problems. For example, maybe a security-definer function could leave behind variables to trick the calling code into failing to set GUCs that it intended to set. Or maybe creating a variable at the wrong time will just break things randomly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund wrote: > I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in > the face of previously corrupted tuple chains and pg_upgraded clusters - > it can lead to tuples being considered related, even though they they're > from entirely independent hot chains. Especially when upgrading 9.3 post > your fix, to current releases. I think this is a key point. If the new behavior were merely not entirely correct, we could perhaps refine it later. But it's not only not correct - it actually has the potential to create new problems that didn't exist before those commits. And if we release without reverting those commits then we can't change our mind later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add some const decorations to prototypes
On Tue, Oct 31, 2017 at 8:26 PM, Tom Lane wrote: > ... but I'm not sure that it's an improvement in cases where you have to > cast away the const somewhere else. I agree. I guess I may be in the minority here but I don't really like decorating things with const too much because I have tended to find that once you start adding const, you end up having to add it in more and more places to avoid warnings, and then eventually that causes you to have to start casting it away. Perhaps I was just Doing It Wrong. Anyway, I don't see much point in having const if you're just going to have to cast it to non-const. Then it wasn't really very const in the first place... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Thu, Nov 2, 2017 at 1:52 PM, Ashutosh Bapat wrote: > Right now partition keys are immutable but we don't have much code > written with that assumption. All the code usually keeps a lock on the > parent till the time they use the information in the partition key. In > a distant future, which may not exist, we may support ALTER TABLE ... > PARTITION BY to change partition keys (albeit at huge cost of data > movement). If we do that, we will have to remember this one-off > instance of code which assumes that the partition keys are immutable. I am pretty sure this is by no means the only piece of code which assumes that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Thu, Nov 2, 2017 at 1:45 PM, amul sul wrote: > Yes, you are correct, column involved in the partitioning are immutable. > > I was just worried about any change in the partition key column that > might change selected hash function. Any such change, even if it were allowed, would have to take AccessExclusiveLock on the child. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek wrote: > sorry for the delay but I didn't have much time in past weeks to follow > this thread. +TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); /* * Fill the send timestamp last, so that it is taken as late as possible. * This is somewhat ugly, but the protocol is set as it's already used for * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); -pq_sendint64(&tmpbuf, GetCurrentTimestamp()); +pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); This change falsifies the comments. Maybe initialize now just after resetSetringInfo() is done. -/* fast path */ -/* Try to flush pending output to the client */ -if (pq_flush_if_writable() != 0) -WalSndShutdown(); +/* Try taking fast path unless we get too close to walsender timeout. */ +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) +{ +CHECK_FOR_INTERRUPTS(); -if (!pq_is_send_pending()) -return; +/* Try to flush pending output to the client */ +if (pq_flush_if_writable() != 0) +WalSndShutdown(); + +if (!pq_is_send_pending()) +return; +} I think it's only the if (!pq_is_send_pending()) return; that needs to be conditional here, isn't it? The pq_flush_if_writable() can be done unconditionally. Other than that this looks like a reasonable change to me, but I'm not an expert on this code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Wed, Nov 1, 2017 at 3:46 PM, amul sul wrote: > Although partition constraints become more simple, there isn't any performance > gain with 0005 patch. Also I am little skeptic about logic in 0005 where we > copied extended hash function info from the partition key, what if parent is > changed while we are using it? Do we need to keep lock on parent until commit > in > satisfies_hash_partition? I don't think it should be possible for the parent to be changed. I mean, the partition key is altogether immutable -- it can't be changed after creation time. The partition bounds can be changed for individual partitions but that would require a lock on the partition. Can you give an example of the kind of scenario about which you are concerned? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Tue, Oct 31, 2017 at 2:47 PM, Alexander Korotkov wrote: > However, from user prospective of view, current behavior of > hot_standby_feedback is just broken, because it both increases bloat and > doesn't guarantee that read-only query on standby wouldn't be cancelled > because of vacuum. Therefore, we should be looking for solution: if one > approach isn't good enough, then we should look for another approach. > > I can propose following alternative approach: teach read-only queries on hot > standby to tolerate concurrent relation truncation. Therefore, when > non-existent heap page is accessed on hot standby, we can know that it was > deleted by concurrent truncation and should be assumed to be empty. Any > thoughts? Sounds like it might break MVCC? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic result sets from procedures
On Wed, Nov 1, 2017 at 2:38 AM, Peter Eisentraut wrote: > So this is what it can do: > > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); > > and that returns those two result sets to the client. That seems like it is at least arguably a wire protocol break. Today, if you send a string containing only one command, you will only get one answer. I'm not saying that makes this change utterly unacceptable or anything -- but I wonder how much application code it will break, and whether any steps need to be taken to reduce breakage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Thu, Nov 2, 2017 at 8:01 AM, Craig Ringer wrote: > That forces materialization, and I'm guessing part of Tomas's goal > here is to prevent the need to materialize into a temp table / > tuplestore / etc. I get that, but if you're running a query like "SELECT * FROM bigtable", you don't need parallel query in the first place, because a single backend is quite capable of sending back the rows as fast as a client can read them. If you're running a query like "SELECT * FROM bigtable WHERE " then that's a good use case for parallel query, but then materializing it isn't that bad because the result set is a lot smaller than the original table. I am not disputing the idea that there are *some* cases where parallel query is useful and materialization is still undesirable, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke wrote: > Yep. > But as David reported earlier, if we remove the first part i.e. adding > cpu_operator_cost per tuple, Merge Append will be preferred over an Append > node unlike before. And thus, I thought of better having both, but no so > sure. Should we remove that part altogether, or add both in a single > statement with updated comments? I was only suggesting that you update the comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Wed, Nov 1, 2017 at 3:47 AM, Tomas Vondra wrote: > But maybe there's a simpler option - what if we only allow fetches from > the PARALLEL cursor while the cursor is open? That is, this would work: > > BEGIN; > ... > DECLARE x PARALLEL CURSOR FOR SELECT * FROM t2 WHERE ...; > FETCH 1000 FROM x; > FETCH 1000 FROM x; > FETCH 1000 FROM x; > CLOSE x; > ... > COMMIT; > > but adding any other command between the OPEN/CLOSE commands would fail. > That should close all the holes with parallel-unsafe stuff, right? I think that still leaves a fair number of scenarios to consider, and the error handling by itself seems pretty thorny. Plus it's kind of a weird mode and, like Craig, I'm not really sure what it gets you. Maybe if somebody has the use case where this would help, they should just do: CREATE TEMP TABLE x AS SELECT * FROM t2 WHERE ...; DECLARE x CURSOR FOR SELECT * FROM x; Since e9baa5e9fa147e00a2466ab2c40eb99c8a700824, that ought to work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)
On Wed, Nov 1, 2017 at 7:49 AM, Craig Ringer wrote: > If the client wants to fetch in chunks it can use a portal and limited > size fetches. That shouldn't (?) be parallel-unsafe, since nothing > else can happen in the middle anyway. I believe sending a limited-size fetch forces serial execution currently. If it's true that nothing else can happen in the middle then we could relax that, but I don't see why that should be true? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw
On Wed, Oct 4, 2017 at 5:58 PM, Ashutosh Bapat wrote: > The view with WCO is local but the modification which violates WCO is > being made on remote server by a trigger on remote table. Trying to > control that doesn't seem to be a good idea, just like we can't > control what rows get inserted on the foreign server when they violate > local constraints. I think that's a fair point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: restrict pg_rewind to whitelisted directories
On Mon, Oct 30, 2017 at 6:44 PM, Chris Travers wrote: > The attached patch is cleaned up and filed for the commit fest this next > month: It's generally better to post the patch on the same message as the discussion thread, or at least link back to the discussion thread from the new one. Otherwise people may have trouble understanding the history. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads
On Tue, Oct 17, 2017 at 2:29 AM, Thomas Munro wrote: > Vik Fearing asked off-list why hash joins appear to read slightly more > temporary data than they write. The reason is that we notch up a > phantom block read when we hit the end of each file. Harmless but it > looks a bit weird and it's easily fixed. > > Unpatched, a 16 batch hash join reports that we read 30 more blocks > than we wrote (2 per batch after the first, as expected): > >Buffers: shared hit=434 read=16234, temp read=5532 written=5502 > > With the attached patch: > >Buffers: shared hit=547 read=16121, temp read=5502 written=5502 Committed. Arguably we ought to back-patch this, but it's minor so I didn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GUC for cleanup indexes threshold.
On Mon, Oct 16, 2017 at 3:02 PM, Masahiko Sawada wrote: > I guess that is the patch I proposed. However I think that there still > is room for discussion because the patch cannot skip to cleanup vacuum > when aggressive vacuum, which is one of the situation that I really > wanted to skip. Well, I think there are outstanding concerns that the patch in question is not safe, and I don't see that anything has been done to resolve them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound after missused pg_resetxlogs
On Mon, Oct 16, 2017 at 5:41 PM, alain radix wrote: > I’m facing a problem with a PostgreSQL 9.6.2 reporting this error when > selecting a table > > ERROR: MultiXactId 3268957 has not been created yet -- apparent wraparound > > The root cause is not a Postgres bug but a buggy person that used > pg_resetxlogs obviously without reading or understanding the documentation. Hmm, I hope you patched that person... > I’m trying to extract data from the db to create a new one ;-) > > But pg_dump logically end with the same error. > > I’ve seen the row with t_max = 3268957 page_inspect, I might use it to > extract row from the page, but this will not be quite easy. Not sure if this would work, but maybe try BEGIN ... UPDATE the row, perhaps via TID qual ... ROLLBACK? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Mon, Oct 16, 2017 at 5:03 PM, Ashutosh Bapat wrote: > set_append_rel_size() crashes when it encounters a partitioned table > with a dropped column. Dropped columns do not have any translations > saved in AppendInfo::translated_vars; the corresponding entry is NULL > per make_inh_translation_list(). > 1802 att = TupleDescAttr(old_tupdesc, old_attno); > 1803 if (att->attisdropped) > 1804 { > 1805 /* Just put NULL into this list entry */ > 1806 vars = lappend(vars, NULL); > 1807 continue; > 1808 } > > In set_append_rel_size() we try to attr_needed for child tables. While > doing so we try to translate a user attribute number of parent to that > of a child and crash since the translated Var is NULL. Here's patch to > fix the crash. The patch also contains a testcase to test dropped > columns in partitioned table. > > Sorry for not noticing this problem earlier. OK, committed. This is a good example of how having good code coverage doesn't necessarily mean you've found all the bugs. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Current int & float overflow checking is slow.
On Mon, Oct 30, 2017 at 11:24 PM, Andres Freund wrote: >> > Perhaps it should rather be pg_add_s32_overflow, or a similar >> > naming scheme? >> >> Not sure what the s is supposed to be? Signed? > > Yes, signed. So we could add a u32 or something complementing the > functions already in the patch. Even though overflow checks are a heck > of a lot easier to write for unsigned ints, the intrinsics are still > faster. I don't have any sort of strong feelings on the naming. Right, I guess including the s is probably a good idea then. >> I suggest that if we think we don't need -fwrapv any more, we ought to >> remove it. Otherwise, we won't find out if we're wrong. > > I agree that we should do so at some point not too far away in the > future. Not the least because we don't specify this kind of C dialect in > a lot of other compilers. Additionally the flag causes some slowdown > (because e.g. for loop variables are optimized less). But I'm fairly > certain it needs a bit more care that I've invested as of now - should > probably at least compile with -Wstrict-overflow=some-higher-level, and > with ubsan. I'm fairly certain there's more bogus overflow checks > around... Makes sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
On Mon, Oct 30, 2017 at 5:52 PM, amul sul wrote: > Actually, int4[] is also inappropriate type as we have started using a 64bit > hash function. We need something int8[] which is not available, so that I > have used ANYARRAYOID in the attached patch(0004). I don't know why you think int8[] is not available. rhaas=# select 'int8[]'::regtype; regtype -- bigint[] (1 row) >> I wrote the following query >> to detect problems of this type, and I think we might want to just go >> ahead and add this to the regression test suite, verifying that it >> returns no rows: >> >> select oid::regprocedure, provariadic::regtype, proargtypes::regtype[] >> from pg_proc where provariadic != 0 >> and case proargtypes[array_length(proargtypes, 1)-1] >> when 2276 then 2276 -- any -> any >> when 2277 then 2283 -- anyarray -> anyelement >> else (select t.oid from pg_type t where t.typarray = >> proargtypes[array_length(proargtypes, 1)-1]) end >> != provariadic; >> > > Added in 0001 patch. Committed. > One advantage of current implementation is that we can see which hash > function are used for the each partitioning column and also we don't need to > worry about user specified opclass and different input types. > > Something similar I've tried in my initial patch version[1], but I have missed > user specified opclass handling for each partitioning column. Do you want me > to handle opclass using RelabelType node? I am afraid that, that would make > the \d+ output more horrible than the current one if non-default opclass used. Maybe we should just pass the OID of the partition (or both the partition and the parent, so we can get the lock ordering right?) instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Current int & float overflow checking is slow.
On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund wrote: > 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result) > These use compiler intrinsics on gcc/clang. If that's not > available, they cast to a wider type and to overflow checks. For > 64bit there's a fallback for the case 128bit math is not > available (here I stole from an old patch of Greg's). > > These fallbacks are, as far as I can tell, C free of overflow > related undefined behaviour. Looks nice. > Perhaps it should rather be pg_add_s32_overflow, or a similar > naming scheme? Not sure what the s is supposed to be? Signed? > 0002) Converts int.c, int8.c and a smattering of other functions to use > the new facilities. This removes a fair amount of code. > > It might make sense to split this up further, but right now that's > the set of functions that either are affected performancewise by > previous overflow checks, and/or relied on wraparound > overflow. There's probably more places, but this is what I found > by visual inspection and compiler warnings. I lack the patience to review this tonight. > 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but > it seems like an important test for the new facilities. Without > 0002, tests would fail after this, after it all tests run > successfully. I suggest that if we think we don't need -fwrapv any more, we ought to remove it. Otherwise, we won't find out if we're wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera wrote: >> A sort of middle way would be to keep the secondary checkpoint around >> but never try to replay from that point, or only if a specific flag is >> provided. > > Why do you want to keep the secondary checkpoint? If there is no way to > automatically start a recovery from the prior checkpoint, is it really > possible to do the same manually? I think the only advantage of keeping > it is that the WAL files are kept around for a little bit longer. But > is that useful? Surely for any environment where you really care, you > have a WAL archive somewhere, so it doesn't matter if files are removed > from the primary's pg_xlog dir. (apologies for the empty message) I don't really want anything in particular here, other than for the system to be reliable. If we're confident that there's zero value in the secondary checkpoint then, sure, ditch it. Even if you have the older WAL files around in an archive, it doesn't mean that you know where the previous checkpoint start location is ... but actually, come to think of it, if you did need to know that, you could just run pg_waldump to find it. That probably wasn't true when this code was originally written, but it is today. I was mostly just thinking out loud, listing another option rather than advocating for it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On Mon, Oct 30, 2017 at 7:04 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund wrote: >> > I think it does the contrary. The current mechanism is, in my opinion, >> > downright dangerous: >> > https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de >> >> A sort of middle way would be to keep the secondary checkpoint around >> but never try to replay from that point, or only if a specific flag is >> provided. > > Why do you want to keep the secondary checkpoint? If there is no way to > automatically start a recovery from the prior checkpoint, is it really > possible to do the same manually? I think the only advantage of keeping > it is that the WAL files are kept around for a little bit longer. But > is that useful? Surely for any environment where you really care, you > have a WAL archive somewhere, so it doesn't matter if files are removed > from the primary's pg_xlog dir. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove secondary checkpoint
On Tue, Oct 24, 2017 at 7:25 PM, Andres Freund wrote: > I think it does the contrary. The current mechanism is, in my opinion, > downright dangerous: > https://www.postgresql.org/message-id/20160201235854.go8...@awork2.anarazel.de A sort of middle way would be to keep the secondary checkpoint around but never try to replay from that point, or only if a specific flag is provided. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov wrote: > Hello. I made some bugfixes and rewrite the patch. I don't think it's a good idea to deliberately leave the state of the standby different from the state of the master on the theory that it won't matter. I feel like that's something that's likely to come back to bite us. Giving LockAcquireExtended() an argument that causes some AccessExclusiveLocks not all to be logged also seems pretty ugly, especially because there are no comments whatsoever explaining the rationale. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Sun, Oct 29, 2017 at 1:19 AM, Simon Riggs wrote: > Nothing I am proposing blocks later work. That's not really true. Nobody's going to be happy if MERGE has one behavior in one set of cases and an astonishingly different behavior in another set of cases. If you adopt a behavior for certain cases that can't be extended to other cases, then you're blocking a general-purpose MERGE. And, indeed, it seems that you're proposing an implementation that adds no new functionality, just syntax compatibility. Do we really want or need two syntaxes for the same thing in core? I kinda think Peter might have the right idea here. Under his proposal, we'd be getting something that is, in a way, new. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallelize queries containing initplans
On Mon, Oct 30, 2017 at 9:00 AM, Amit Kapila wrote: > Now that the PARAM_EXTERN issue is fixed, I have rebased this patch. > This patch had been switched to Ready For Committer in last CF, then > Robert had comments which I have addressed, so I think the status > should be switched back to Ready For committer. Let me know if you > think it should be switched to some other status. The change to ExplainPrintPlan doesn't look good to me, because it actually moves the initPlan; I don't think it's good for EXPLAIN to mutate the plan state tree. It should find a way to display the results *as if* the initPlans were attached to the subnode, but without actually moving them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel worker error
On Mon, Oct 30, 2017 at 8:25 AM, Amit Kapila wrote: > Thanks. I have closed this entry in CF app, however, I am not sure > what is the best way to deal with the entry present in PostgreSQL 10 > Open Items page[1]. The entry for this bug seems to be present in > Older Bugs section. Shall we leave it as it is or do we want to do > something else with it? > > [1] - https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items How about just adding an additional bullet point for that item that says "fixed by commit blah blah for 10.1"? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Sat, Oct 28, 2017 at 8:02 PM, Amit Kapila wrote: > I think we need to make changes in exec_simple_recheck_plan to make > the behavior similar to HEAD. With the attached patch, all tests > passed with force_parallel_mode. Committed to REL_10_STABLE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
On Sat, Oct 28, 2017 at 4:52 PM, Chris Travers wrote: > There are still some cleanup bits needed here but I wanted to get feedback > on my general direction. > > I hope to submit for commit fest soon if the general feedback is good. I think you should submit to the CommitFest regardless, to increase your chances of getting feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel worker error
On Sun, Oct 29, 2017 at 12:02 PM, Amit Kapila wrote: > This patch no longer applies, so attached rebased patches. I have > also created patches for v10 as we might want to backpatch the fix. > Added the patch in CF (https://commitfest.postgresql.org/15/1342/) Thanks. I picked the second variant, committed, and also back-patched to 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing pg_receivewal --no-sync
On Sun, Oct 29, 2017 at 3:42 AM, Michael Paquier wrote: > Okay. Here is an updated patch incorporating those comments. Committed with a little wordsmithing on the documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] hash partitioning
ween WITH and the parenthesis which follows. In 0003, the changes to partition_bounds_copy claim that I shouldn't worry about the fact that typlen is set to 4 because datumCopy won't use it for a pass-by-value datatype, but I think that calling functions with incorrect arguments and hoping that they ignore them and therefore nothing bad happens doesn't sound like a very good idea. Fortunately, I think the actual code is fine; I think we just need to change the comments. For hash partitioning, the datums array always contains two integers, which are of type int4, which is indeed a pass-by-value type of length 4 (note that if we were using int8 for the modulus and remainder, we'd need to set byval to FLOAT8PASSBYVAL). I would just write this as: if (hash_part) { typlen = sizeof(int32); /* always int4 */ byval = true; /* int4 is pass-by-value */ } + for (i = 0; i < nkeys; i++) + { + if (!isnull[i]) + rowHash = hash_combine64(rowHash, DatumGetUInt64(hash_array[i])); + } Excess braces. I think it might be better to inline the logic in mix_hash_value() into each of the two callers. Then, the callers wouldn't need Datum hash_array[PARTITION_MAX_KEYS]; they could just fold each new hash value into a uint64 value. That seems likely to be slightly faster and I don't see any real downside. rhaas=# create table natch (a citext, b text) partition by hash (a); ERROR: XX000: missing support function 2(16398,16398) in opfamily 16437 LOCATION: RelationBuildPartitionKey, relcache.c:954 It shouldn't be possible to reach an elog() from SQL, and this is not a friendly error message. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix typo in blvacuum.c
On Tue, Oct 17, 2017 at 3:48 AM, Seki, Eiji wrote: > I found a typo in comment of blvacuum.c, so attach the patch for it. Thanks. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Thu, Oct 19, 2017 at 9:05 AM, Amit Khandekar wrote: >> + *ExecAppendEstimate >> + * >> + *estimates the space required to serialize Append node. >> >> Ugh, this is wrong, but I notice it follows various other >> equally-wrong comments for other parallel-aware node types. I guess >> I'll go fix that. We are not in serializing the Append node. > > I didn't clealy get this. Do you think it should be "space required to > copy the Append node into the shared memory" ? No, because the Append node is *NOT* getting copied into shared memory. I have pushed a comment update to the existing functions; you can use the same comment for this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke wrote: > 1. Added separate patch for costing Append node as discussed up-front in the > patch-set. > 2. Since we now cost Append node, we don't need > partition_wise_agg_cost_factor > GUC. So removed that. The remaining patch hence merged into main > implementation > patch. > 3. Updated rows in test-cases so that we will get partition-wise plans. With 0006 applied, cost_merge_append() is now a little bit confused: /* * Also charge a small amount (arbitrarily set equal to operator cost) per * extracted tuple. We don't charge cpu_tuple_cost because a MergeAppend * node doesn't do qual-checking or projection, so it has less overhead * than most plan nodes. */ run_cost += cpu_operator_cost * tuples; /* Add MergeAppend node overhead like we do it for the Append node */ run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples; The first comment says that we don't add cpu_tuple_cost, and the second one then adds half of it anyway. I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR, because as you say it's used twice, but I don't think that should be exposed in cost.h; I'd make it private to costsize.c and rename it to something like APPEND_CPU_COST_MULTIPLIER. The word DEFAULT, in particular, seems useless to me, since there's no provision for it to be overridden by a different value. What testing, if any, can we think about doing with this plan to make sure it doesn't regress things? For example, if we do a TPC-H run with partitioned tables and partition-wise join enabled, will any plans change with this patch? Do they get faster or not? Anyone have other ideas for what to test? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding table_constraint description in ALTER TABLE synopsis
On Thu, Oct 26, 2017 at 3:23 PM, Lætitia Avrot wrote: > In documentation, I've found that table_constraint is used in the ALTER > TABLE synopsis but that definition of table_constraint is missing, so I > submitted bug #14873. > > I found the table_constraint definition in the CREATE TABLE synopsis and I > just copied/pasted it on the ALTER TABLE synopsis. > > The patch should apply to MASTER.I build and tested it successfully on my > computer. > > There shouldn't be any platform-specific content. > > You will find enclosed my patch. I tried my best to follow instructions on > how to submit a patch. I'd say you did a good job. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typos in src/backend/optimizer/README
On Fri, Oct 27, 2017 at 9:34 AM, Etsuro Fujita wrote: > This sentence in the section of Partition-wise joins in > src/backend/optimizer/README should be fixed: "This technique of breaking > down a join between partition tables into join between their partitions is > called partition-wise join." > > (1) s/a join between partition tables/a join between partitioned tables/ > (2) s/join between their partitions/joins between their partitions/ > > It might be okay to leave #2 as-is, but I'd like to propose to change that > way to make the meaning clear. I think you are right. I have committed the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unsafe tuple releasing in get_default_partition_oid
On Sat, Oct 28, 2017 at 10:03 AM, Julien Rouhaud wrote: > I just noticed that get_default_partition_oid() tries to release the > tuple even if it isn't valid. > Trivial patch attached. Oops. I wonder how that managed to survive testing. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Thu, Oct 26, 2017 at 4:11 PM, Masahiko Sawada wrote: >> Why do we want the the backend to linger behind, once it has added its >> foreign transaction entries in the shared memory and informed resolver >> about it? The foreign connections may take their own time and even >> after that there is no guarantee that the foreign transactions will be >> resolved in case the foreign server is not available. So, why to make >> the backend wait? > > Because I don't want to break the current user semantics. that is, > currently it's guaranteed that the subsequent reads can see the > committed result of previous writes even if the previous transactions > were distributed transactions. Right, this is very important, and having the backend wait for the resolver(s) is, I think, the right way to implement it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel safety for extern params
On Mon, Oct 16, 2017 at 12:59 PM, Amit Kapila wrote: >> I think the Param case should be mentioned after "... but" not before >> - i.e. referencing the child node's output... but setrefs.c might also >> have copied a Const or Param is-is. > > I am not sure if we can write the comment like that (.. copied a Const > or Param as-is.) because fix_upper_expr_mutator in setrefs.c has a > special handling for Var and Param where constants are copied as-is > via expression_tree_mutator. Also as proposed, the handling for > params is more like Var in exec_save_simple_expr. I committed fix_parallel_mode_nested_execution_v2.patch with some cosmetic tweaks. I back-patched it to 10 and 9.6, then had to fix some issues reported by Tom as followup commits. With respect to the bit quoted above, I rephrased the comment in a slightly different way that hopefully is a reasonable compromise, combined it with the main patch, and pushed it to master. Along the way I also got rid of the if statement you introduced and just made the Assert() more complicated instead, which seems better to me. When I tried back-porting the patch to v10 I discovered that the plpgsql changes conflict heavily and that ripping them out all the way produces regression failures under force_parallel_mode. I think I see why that's happening but it's not obvious how to me how to adapt b73f1b5c29e0ace5a281bc13ce09dea30e1b66de to the v10 code. Do you want to have a crack at it or should we just leave this as a master-only fix? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Local indexes for partitioned table
On Fri, Oct 27, 2017 at 7:27 PM, Alvaro Herrera wrote: > I noticed that RelationBuildPartitionKey is generating a partition key > in a temp context, then creating a private context and copying the key > into that. That seems leftover from some previous iteration of some > other patch; I think it's pretty reasonable to create the new context > right from the start and allocate the key there directly instead. Then > there's no need for copy_partition_key at all. We could do that, but the motivation for the current system was to avoid leaking memory in a long-lived context. I think the same technique is used elsewhere for similar reasons. I admit I haven't checked whether there would actually be a leak here if we did it as you propose, but I wouldn't find it at all surprising. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scan for cube and seg
On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin wrote: > For cube there is new default opclass. I seem to recall that changing the default opclass causes unsolvable problems with upgrades. You might want to check the archives for previous discussions of this issue; unfortunately, I don't recall the details off-hand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 4:41 PM, Simon Riggs wrote: > I didn't say it but my intention was to just throw an ERROR if no > single unique index can be identified. > > It could be possible to still run MERGE in that situaton but we would > need to take a full table lock at ShareRowExclusive. It's quite likely > that such statements would throw duplicate update errors, so I > wouldn't be aiming to do anything with that for PG11. Like Peter, I think taking such a strong lock for a DML statement doesn't sound like a very desirable way forward. It means, for example, that you can only have one MERGE in progress on a table at the same time, which is quite limiting. It could easily be the case that you have multiple MERGE statements running at once but they touch disjoint groups of rows and therefore everything works. I think the code should be able to cope with concurrent changes, if nothing else by throwing an ERROR, and then if the user wants to ensure that doesn't happen by taking ShareRowExclusiveLock they can do that via an explicit LOCK TABLE statement -- or else they can prevent concurrency by any other means they see fit. Other problems with taking ShareRowExclusiveLock include (1) probable lock upgrade hazards and (2) do you really want MERGE to kick autovacuum off of your giant table? Probably not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Oct 27, 2017 at 10:55 AM, Simon Riggs wrote: > Questions? I think one of the reasons why Peter Geoghegan decided to pursue INSERT .. ON CONFLICT UPDATE was that, because it is non-standard SQL syntax, he felt free to mandate a non-standard SQL requirement, namely the presence of a unique index on the arbiter columns. If MERGE's join clause happens to involve equality conditions on precisely the same set of columns as some unique index on the target table, then I think you can reuse the INSERT .. ON CONFLICT UPDATE infrastructure and I suspect there won't be too many problems. However, if it doesn't, then what? You could decree that such cases will fail, but that might not meet your use case for developing the feature. Or you could try to soldier on without the INSERT .. ON CONFLICT UPDATE machinery, but that means, I think, that sometimes you will get serialization anomalies - at least, I think, you will sometimes obtain results that couldn't have been obtained under any serial order of execution, and maybe it would end up being possible to fail with serialization errors or unique index violations. In the past, there have been objections to implementations of MERGE which would give rise to such serialization anomalies, but I'm not sure we should feel bound by those discussions. One thing that's different is that the common and actually-useful case can now be made to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE; if less useful cases are vulnerable to some weirdness, maybe it's OK to just document the problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: BRIN bloom indexes
On Fri, Oct 27, 2017 at 2:55 PM, Alvaro Herrera wrote: > I was rather thinking that if we can make this very robust against the > index growing out of proportion, we should consider ditching the > original minmax and replace it with multirange minmax, which seems like > it'd have much better behavior. If the multirange stuff can be done in such a way that it's just an updated version of the same opclass, and backward-compatible on disk, then I think this would be OK. But otherwise I don't think we ditch what already exists. That would break upgrades via both pg_upgrade and pg_dump, which seems like too high a price to pay to get rid of some arguably-worse code. It's actually WORSE to drop an opclass (which will make dumps not restore) than to do something like bump HASH_VERSION (which doesn't affect pg_dump at all and for pg_upgrade only requires post-upgrade steps rather than failing outright). > I don't see any reason to put any of this in contrib. Well, for one thing, it makes it easier to drop stuff later if we decide we don't really want it. I think that whichever BRIN opclasses are thought to be high quality and of general utility can go into core, just as we've done with other index AMs. However, if we think that something is useful-ish but maybe not something to which we want to make a permanent commitment, putting it into contrib is good for that. Upgrades are easier for things in contrib, too, because there's a built-in mechanism for people to try updating the SQL extensions (ALTER EXTENSION .. UPDATE) and if it fails they can adjust things and try again. When you just make a hard change to SQL definitions in a new release, any failures that result from those changes just turn into upgrade failures, which is IMHO a lot more painful than a failure to update an extension version while the database is still up and usable the whole time. For instance, if pg_stat_activity were bundled in an extension and we made the C code backward-compatibility with old extension versions, then some of the upgrade pain users have had with that over the years could have been avoided. People could update to the new version without failures and then at their leisure try to update. If the update failed due to dependencies, then they would have time to figure out what to do about it and try again later; in the meantime, they'd be on the new version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)
eaches libpq to ignore the NegotiateProtocolVersion message. With that patch applied, make check-world passes, which seems to show that the server-side changes are not totally broken. More testing would be great, of course. Some thoughts on the future: - libpq should grow an option to force a specific protocol version. Andres already proposed one to force 2.0, but now we probably want to generalize that to also allow forcing a particular minor version. That seems useful for testing, if nothing else, and might let us add TAP tests that this stuff works as intended. - Possibly we should commit the portion of the testing patch which ignores NegotiateProtocolVersion to v11, maybe also adding a connection status function that lets users inquire about whether a NegotiateProtocolVersion message was received and, if so, what parameters it reported as unrecognized and what minor version it reported the server as speaking. The existing PQprotocolVersion interface looks inadequate, as it seems to return only the major version. - On further reflection, I think the reconnect functionality you proposed previously is probably a good idea. It won't be necessary with servers that have been patched to send NegotiateProtocolVersion, but there may be quite a few that haven't for a long time to come, and although an automated reconnect is a little annoying, it's a lot less annoying than an outright connection failure. So that part of your patch should probably be resubmitted when and if we bump the version to 3.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company pgwire-be-rmh.patch Description: Binary data pgwire-libpq-test.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?
On Thu, Oct 26, 2017 at 11:14 AM, Ashutosh Bapat wrote: > In a nearby thread, we are discussing about atomic commit of > transactions involving foreign transactions. For maintaining > consistency, atomicity of transactions writing to foreign server, we > will need to create local transactions. Will that be possible on > standby? Obviously, we could add a restriction that no consistency and > atomic commit is guaranteed when foreign servers are written from a > standby. I am not sure how easy would be to impose such a restriction > and whether such a restriction would be practical. Yeah, that feature definitely can't work on a standby. But we could probably allow writes when that feature is not in use. One thing that bothers me about Alexander's patch is that there wouldn't be any way to distinguish between those two cases. Maybe we need a callback to ask the FDW "given the configured options, could you work on a standby?" However, as Michael also points out, it's arguably wrong to allow a nominally read-only transaction to write data regardless of whether it works. In the case of a standby it could be argued that your transaction is only read-only because you had no other choice, but nonetheless that's how it is marked. I have a feeling that if we extend the definition of "read only" to mean "sometimes allow writes", we may regret it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementing pg_receivewal --no-sync
On Wed, Oct 25, 2017 at 10:03 PM, Michael Paquier wrote: > This sentence is actually wrong, a feedback message is never sent with > the feedback message. Eh? I think this looks basically fine, though I'd omit the short option for it. There are only so many letters in the alphabet, so let's not use them up for developer-convenience options. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers