Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 9:21 AM, Andres Freund wrote: > > On 2016-06-21 08:59:13 +0530, Amit Kapila wrote: > > Can we consider to use some strategy to avoid deadlocks without releasing > > the lock on old page? Consider if we could have a mechanism such that > > RelationGetBufferForTuple() will ensure that it always returns a new buffer > > which has targetblock greater than the old block (on which we already held > > a lock). I think here tricky part is whether we can get anything like that > > from FSM. Also, there could be cases where we need to extend the heap when > > there were pages in heap with space available, but we have ignored them > > because there block number is smaller than the block number on which we > > have lock. > > I can't see that being acceptable, from a space-usage POV. > > > > So far the best idea I have - and it's really not a good one - is to > > > invent a new hint-bit that tells concurrent updates to acquire a > > > heavyweight tuple lock, while releasing the page-level lock. If that > > > hint bit does not require any other modifications - and we don't need an > > > xid in xmax for this use case - that'll avoid doing all the other > > > `already_marked` stuff early, which should address the correctness > > > issue. > > > > > > > Don't we need to clear such a flag in case of error? Also don't we need to > > reset it later, like when modifying the old page later before WAL. > > If the flag just says "acquire a heavyweight lock", then there's no need > for that. That's cheap enough to just do if it's errorneously set. At > least I can't see any reason. > I think it will just increase the chances of other backends to acquire a heavy weight lock. > > > It's kinda invasive though, and probably has performance > > > implications. > > > > > > > Do you see performance implication due to requirement of heavywieht tuple > > lock in more cases than now or something else? > > Because of that, yes. > > > > Some others ways could be: > > > > Before releasing the lock on buffer containing old tuple, clear the VM and > > visibility info from page and WAL log it. I think this could impact > > performance depending on how frequently we need to perform this action. > > Doubling the number of xlog inserts in heap_update would certainly be > measurable :(. My guess is that the heavyweight tuple lock approach will > be less expensive. > Probably, but I think heavyweight tuple lock is more invasive. I think increasing the number of xlog inserts could surely impact performance, but depending upon how frequently we need to call it. I think we might want to combine it with the idea of RelationGetBufferForTuple() to return higher-block number, such that if we don't find higher block-number from FSM, then we can release the lock on old page and try to get the locks on old and new buffers as we do now. This will further reduce the chances of increasing xlog insert calls and address the issue of space-wastage. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 9:16 AM, Thomas Munro wrote: > > On Tue, Jun 21, 2016 at 3:29 PM, Amit Kapila wrote: > > Some others ways could be: > > > > Before releasing the lock on buffer containing old tuple, clear the VM and > > visibility info from page and WAL log it. I think this could impact > > performance depending on how frequently we need to perform this action. > > > > Have a new flag like HEAP_XMAX_UNLOGGED (as it was there when this logic was > > introduced in commit f2bfe8a24c46133f81e188653a127f939eb33c4a ) and set the > > same in old tuple header before releasing lock on buffer and teach tqual.c > > to honor the flag. I think tqual.c should consider HEAP_XMAX_UNLOGGED as > > an indication of aborted transaction unless it is currently in-progress. > > Also, I think we need to clear this flag before WAL logging in heap_update. > > I also noticed that and wondered whether it was a mistake to take that > out. It appears to have been removed as part of the logic to clear > away UNDO log support in 11919160, but it may have been an important > part of the heap_update protocol. Though (as I mentioned nearby in a > reply to Noah) it I'm not sure if the tqual.c side which would ignore > the unlogged xmax in the event of a badly timed crash was ever > implemented. > Right, my observation is similar to yours and that's what I am suggesting as one-alternative to fix this issue. I think making this approach work (even if this doesn't have any problems) might turn out to be tricky. However, the plus-point of this approach seems to be that it shouldn't impact performance in most of the cases. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 9:08 AM, Thomas Munro wrote: > > On Tue, Jun 21, 2016 at 3:29 PM, Amit Kapila wrote: > > On Tue, Jun 21, 2016 at 1:03 AM, Andres Freund wrote: > >> Well, I think generally nobody seriously looked at actually refactoring > >> heap_update(), even though that'd be a good idea. But in this instance, > >> the problem seems relatively fundamental: > >> > >> We need to lock the origin page, to do visibility checks, etc. Then we > >> need to figure out the target page. Even disregarding toasting - which > >> we could be doing earlier with some refactoring - we're going to have to > >> release the page level lock, to lock them in ascending order. Otherwise > >> we'll risk kinda likely deadlocks. > > > > Can we consider to use some strategy to avoid deadlocks without releasing > > the lock on old page? Consider if we could have a mechanism such that > > RelationGetBufferForTuple() will ensure that it always returns a new buffer > > which has targetblock greater than the old block (on which we already held a > > lock). I think here tricky part is whether we can get anything like that > > from FSM. Also, there could be cases where we need to extend the heap when > > there were pages in heap with space available, but we have ignored them > > because there block number is smaller than the block number on which we have > > lock. > > Doesn't that mean that over time, given a workload that does only or > mostly updates, your records tend to migrate further and further away > from the start of the file, leaving a growing unusable space at the > beginning, until you eventually need to CLUSTER/VACUUM FULL? > The request for updates should ideally fit in same page as old tuple for many of the cases if fillfactor is properly configured, considering update-mostly loads. Why would it be that always the records will migrate further away, they should get the space freed by other updates in intermediate pages. I think there could be some impact space-wise, but freed-up space will be eventually used. > I was wondering about speculatively asking for a free page with a > lower block number than the origin page, if one is available, before > locking the origin page. Do you wan't to lock it as well? In any-case, I think adding the code without deciding whether the update can be accommodated in current page can prove to be costly. > Then after locking the origin page, if it > turns out you need a page but didn't get it earlier, asking for a free > page with a higher block number than the origin page. > Something like that might workout if it is feasible and people agree on pursuing such an approach. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Missing checks when malloc returns NULL...
Hi all, While auditing the code, I got surprised that there are a couple of code paths that do nothing for this error handling: - pg_regress and isolationtester use malloc extensively, in case of failure those would just crash crash. I think that it matters for buildfarm members that are under memory pressure to not do so, so those should use pg_malloc instead. - refint.c makes use of malloc to store plans in top memory context. That's a buggy concept clearly... This code would need to be reworked more largely than in the patch I attach. - pg_dlsym for darwin uses malloc, but would crash on failure - ps_status.c does nothing when it uses malloc(). - sprompt.c uses malloc once, and would crash on failure - mcxt.c uses that, which is surprising: @@ -704,7 +704,8 @@ MemoryContextCreate(NodeTag tag, Size size, { /* Special case for startup: use good ol' malloc */ node = (MemoryContext) malloc(needed); - Assert(node != NULL); + if (node == NULL) + elog(PANIC, "out of memory"); } I think that a PANIC is cleaner here instead of a simple crash. So attached is a patch aimed at improving things. Thoughts? -- Michael malloc-nulls.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Tue, Jun 21, 2016 at 1:24 AM, Robert Haas wrote: > > On Mon, Jun 20, 2016 at 1:13 PM, Tom Lane wrote: > > Robert Haas writes: > >> On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane wrote: > >>> Personally, I'm +1 for such tinkering if it makes the feature either more > >>> controllable or more understandable. After reading the comments at the > >>> head of nodeGather.c, though, I don't think that single_copy is either > >>> understandable or useful, and merely renaming it won't help. Apparently, > >>> it runs code in the worker, except when it doesn't, and even when it does, > >>> it's absolutely guaranteed to be a performance loss because the leader is > >>> doing nothing. What in the world is the point? > > > >> The single_copy flag allows a Gather node to have a child plan which > >> is not intrinsically parallel. > > > > OK, but I'm very dubious of your claim that this has any use except for > > testing purposes. It certainly has no such use today. Therefore, the > > behavior of falling back to running in the leader seems like an > > anti-feature to me. If we want to test running in a worker, then we > > want to test that, not maybe test it. > > > > I propose that the behavior we actually want here is to execute in > > a worker, full stop. If we can't get one, wait till we can. If we > > can't get one because no slots have been allocated at all, fail. > > That would make the behavior deterministic enough for the regression > > tests to rely on it. > +1. > I agree that for force_parallel_mode testing, that behavior would be useful. > > I am also pretty confident we're going to want the behavior where the > leader runs the plan if, and only if, no workers can be obtained for > other purposes. However, I agree that's not essential today. > > > And while I don't know what this mode should be called, I'm pretty sure > > that neither "single_copy" nor "pipeline" are useful descriptions. > > Maybe we should make this an enum rather than a boolean, since there > seem to be more than two useful behaviors. > How about calling it as control_parallel/control_parallelism or override_parallel/override_parallelism? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Reviewing freeze map code
On 2016-06-21 08:59:13 +0530, Amit Kapila wrote: > Can we consider to use some strategy to avoid deadlocks without releasing > the lock on old page? Consider if we could have a mechanism such that > RelationGetBufferForTuple() will ensure that it always returns a new buffer > which has targetblock greater than the old block (on which we already held > a lock). I think here tricky part is whether we can get anything like that > from FSM. Also, there could be cases where we need to extend the heap when > there were pages in heap with space available, but we have ignored them > because there block number is smaller than the block number on which we > have lock. I can't see that being acceptable, from a space-usage POV. > > So far the best idea I have - and it's really not a good one - is to > > invent a new hint-bit that tells concurrent updates to acquire a > > heavyweight tuple lock, while releasing the page-level lock. If that > > hint bit does not require any other modifications - and we don't need an > > xid in xmax for this use case - that'll avoid doing all the other > > `already_marked` stuff early, which should address the correctness > > issue. > > > > Don't we need to clear such a flag in case of error? Also don't we need to > reset it later, like when modifying the old page later before WAL. If the flag just says "acquire a heavyweight lock", then there's no need for that. That's cheap enough to just do if it's errorneously set. At least I can't see any reason. > > It's kinda invasive though, and probably has performance > > implications. > > > > Do you see performance implication due to requirement of heavywieht tuple > lock in more cases than now or something else? Because of that, yes. > Some others ways could be: > > Before releasing the lock on buffer containing old tuple, clear the VM and > visibility info from page and WAL log it. I think this could impact > performance depending on how frequently we need to perform this action. Doubling the number of xlog inserts in heap_update would certainly be measurable :(. My guess is that the heavyweight tuple lock approach will be less expensive. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 3:29 PM, Amit Kapila wrote: > Some others ways could be: > > Before releasing the lock on buffer containing old tuple, clear the VM and > visibility info from page and WAL log it. I think this could impact > performance depending on how frequently we need to perform this action. > > Have a new flag like HEAP_XMAX_UNLOGGED (as it was there when this logic was > introduced in commit f2bfe8a24c46133f81e188653a127f939eb33c4a ) and set the > same in old tuple header before releasing lock on buffer and teach tqual.c > to honor the flag. I think tqual.c should consider HEAP_XMAX_UNLOGGED as > an indication of aborted transaction unless it is currently in-progress. > Also, I think we need to clear this flag before WAL logging in heap_update. I also noticed that and wondered whether it was a mistake to take that out. It appears to have been removed as part of the logic to clear away UNDO log support in 11919160, but it may have been an important part of the heap_update protocol. Though (as I mentioned nearby in a reply to Noah) it I'm not sure if the tqual.c side which would ignore the unlogged xmax in the event of a badly timed crash was ever implemented. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 3:29 PM, Amit Kapila wrote: > On Tue, Jun 21, 2016 at 1:03 AM, Andres Freund wrote: >> Well, I think generally nobody seriously looked at actually refactoring >> heap_update(), even though that'd be a good idea. But in this instance, >> the problem seems relatively fundamental: >> >> We need to lock the origin page, to do visibility checks, etc. Then we >> need to figure out the target page. Even disregarding toasting - which >> we could be doing earlier with some refactoring - we're going to have to >> release the page level lock, to lock them in ascending order. Otherwise >> we'll risk kinda likely deadlocks. > > Can we consider to use some strategy to avoid deadlocks without releasing > the lock on old page? Consider if we could have a mechanism such that > RelationGetBufferForTuple() will ensure that it always returns a new buffer > which has targetblock greater than the old block (on which we already held a > lock). I think here tricky part is whether we can get anything like that > from FSM. Also, there could be cases where we need to extend the heap when > there were pages in heap with space available, but we have ignored them > because there block number is smaller than the block number on which we have > lock. Doesn't that mean that over time, given a workload that does only or mostly updates, your records tend to migrate further and further away from the start of the file, leaving a growing unusable space at the beginning, until you eventually need to CLUSTER/VACUUM FULL? I was wondering about speculatively asking for a free page with a lower block number than the origin page, if one is available, before locking the origin page. Then after locking the origin page, if it turns out you need a page but didn't get it earlier, asking for a free page with a higher block number than the origin page. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On Tue, Jun 21, 2016 at 12:06 PM, Tom Lane wrote: > Peter Eisentraut writes: >> On 6/20/16 10:29 PM, Tom Lane wrote: >>> What I would want to know is whether this specific change is actually a >>> good idea. In particular, I'm concerned about the possible security >>> implications of exposing primary_conninfo --- might it not contain a >>> password, for example? > >> That would have been my objection. This was also mentioned in the >> context of moving recovery.conf settings to postgresql.conf, because >> then the password would become visible in SHOW commands and the like. > >> Alternatively or additionally, implement a way to strip passwords out of >> conninfo information. libpq already has information about which >> connection items are sensitive. > > Yeah, I'd been wondering whether we could parse the conninfo string into > individual fields and then drop the password field. It's hard to see a > reason why this view needs to show passwords, since presumably everything > in it corresponds to successful connections --- if your password is wrong, > you aren't in it. walreceiver.c does not have a direct dependency to libpq yet, everything does through libpqwalreceiver. So a correct move would be to unplug the low-level routines of PQconninfoParse into something in src/common/, where both the backend and frontend could use it. And then we use it to rebuild a connection string. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Tue, Jun 21, 2016 at 1:03 AM, Andres Freund wrote: > > On 2016-06-15 08:56:52 -0400, Robert Haas wrote: > > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2, > > and CTID without logging anything or clearing the all-visible flag and > > then releases the lock on the heap page to go do some more work that > > might even ERROR out. Only if that other work all goes OK do we > > relock the page and perform the WAL-logged actions. > > > > That doesn't seem like a good idea even in existing releases, because > > you've taken a tuple on an all-visible page and made it not > > all-visible, and you've made a page modification that is not > > necessarily atomic without logging it. > > Right, that's broken. > > > > I'm not sure what to do about this: this part of the heap_update() > > logic has been like this forever, and I assume that if it were easy to > > refactor this away, somebody would have done it by now. > > Well, I think generally nobody seriously looked at actually refactoring > heap_update(), even though that'd be a good idea. But in this instance, > the problem seems relatively fundamental: > > We need to lock the origin page, to do visibility checks, etc. Then we > need to figure out the target page. Even disregarding toasting - which > we could be doing earlier with some refactoring - we're going to have to > release the page level lock, to lock them in ascending order. Otherwise > we'll risk kinda likely deadlocks. > Can we consider to use some strategy to avoid deadlocks without releasing the lock on old page? Consider if we could have a mechanism such that RelationGetBufferForTuple() will ensure that it always returns a new buffer which has targetblock greater than the old block (on which we already held a lock). I think here tricky part is whether we can get anything like that from FSM. Also, there could be cases where we need to extend the heap when there were pages in heap with space available, but we have ignored them because there block number is smaller than the block number on which we have lock. > We also certainly don't want to nest > the lwlocks for the toast stuff, inside a content lock for the old > tupe's page. > > So far the best idea I have - and it's really not a good one - is to > invent a new hint-bit that tells concurrent updates to acquire a > heavyweight tuple lock, while releasing the page-level lock. If that > hint bit does not require any other modifications - and we don't need an > xid in xmax for this use case - that'll avoid doing all the other > `already_marked` stuff early, which should address the correctness > issue. > Don't we need to clear such a flag in case of error? Also don't we need to reset it later, like when modifying the old page later before WAL. > It's kinda invasive though, and probably has performance > implications. > Do you see performance implication due to requirement of heavywieht tuple lock in more cases than now or something else? Some others ways could be: Before releasing the lock on buffer containing old tuple, clear the VM and visibility info from page and WAL log it. I think this could impact performance depending on how frequently we need to perform this action. Have a new flag like HEAP_XMAX_UNLOGGED (as it was there when this logic was introduced in commit f2bfe8a24c46133f81e188653a127f939eb33c4a ) and set the same in old tuple header before releasing lock on buffer and teach tqual.c to honor the flag. I think tqual.c should consider HEAP_XMAX_UNLOGGED as an indication of aborted transaction unless it is currently in-progress. Also, I think we need to clear this flag before WAL logging in heap_update. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel.c is not marked as test covered
Peter Eisentraut writes: > On 6/19/16 5:55 PM, Tom Lane wrote: >> Depending on what the percentage actually is, maybe we could treat >> this like the "random" test, and allow a failure to be disregarded >> overall? But that doesn't seem very nice either, in view of our >> increasing reliance on automated testing. If "random" were failing >> 90% of the time on some buildfarm critters, that would probably >> indicate a real problem, but we'd likely not realize it for a long time. > I think this test would only fail if it runs out of workers, and that > would only happen in an installcheck run against a server configured in > a nonstandard way or that is doing something else -- which doesn't > happen on the buildfarm. Um, if you're speaking of select_parallel, that already runs in parallel with two other regression tests, and there is no annotation in the parallel_schedule file suggesting that adding more scripts to that group would be bad. But yes, perhaps putting this test into its own standalone group would be enough of a fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On 6/19/16 5:55 PM, Tom Lane wrote: Depending on what the percentage actually is, maybe we could treat this like the "random" test, and allow a failure to be disregarded overall? But that doesn't seem very nice either, in view of our increasing reliance on automated testing. If "random" were failing 90% of the time on some buildfarm critters, that would probably indicate a real problem, but we'd likely not realize it for a long time. I think this test would only fail if it runs out of workers, and that would only happen in an installcheck run against a server configured in a nonstandard way or that is doing something else -- which doesn't happen on the buildfarm. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
Peter Eisentraut writes: > On 6/20/16 10:29 PM, Tom Lane wrote: >> What I would want to know is whether this specific change is actually a >> good idea. In particular, I'm concerned about the possible security >> implications of exposing primary_conninfo --- might it not contain a >> password, for example? > That would have been my objection. This was also mentioned in the > context of moving recovery.conf settings to postgresql.conf, because > then the password would become visible in SHOW commands and the like. > Alternatively or additionally, implement a way to strip passwords out of > conninfo information. libpq already has information about which > connection items are sensitive. Yeah, I'd been wondering whether we could parse the conninfo string into individual fields and then drop the password field. It's hard to see a reason why this view needs to show passwords, since presumably everything in it corresponds to successful connections --- if your password is wrong, you aren't in it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
Peter Geoghegan writes: > On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >> What I'm tempted to do is trying to document that, as a point of >> policy, parallel query in 9.6 uses up to (workers + 1) times the >> resources that a single session might use. That includes not only CPU >> but also things like work_mem and temp file space. This obviously >> isn't ideal, but it's what could be done by the ship date. > Where would that be documented, though? Would it need to be noted in > the case of each such GUC? Why can't we just note this in the number-of-workers GUCs? It's not like there even *is* a GUC for many of our per-process resource consumption behaviors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On 6/20/16 10:29 PM, Tom Lane wrote: What I would want to know is whether this specific change is actually a good idea. In particular, I'm concerned about the possible security implications of exposing primary_conninfo --- might it not contain a password, for example? That would have been my objection. This was also mentioned in the context of moving recovery.conf settings to postgresql.conf, because then the password would become visible in SHOW commands and the like. We would need a way to put the password in a separate place, like a primary_conn_password setting. Yes, you can already use .pgpass for that, but since pg_basebackup -R will happily copy a password out of .pgpass into recovery.conf, this makes accidentally leaking passwords way too easy. Alternatively or additionally, implement a way to strip passwords out of conninfo information. libpq already has information about which connection items are sensitive. Needs more thought, in any case. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
Michael Paquier writes: > On Tue, Jun 21, 2016 at 11:29 AM, Tom Lane wrote: >> What I would want to know is whether this specific change is actually a >> good idea. In particular, I'm concerned about the possible security >> implications of exposing primary_conninfo --- might it not contain a >> password, for example? > Yes it could, as a connection string, but we make the information of > this view only visible to superusers. For the others, that's just > NULL. Well, that's okay for now, but I'm curious to hear Stephen Frost's opinion on this. He's been on the warpath to decrease our dependence on superuser-ness for protection purposes. Seems to me that having one column in this view that is a lot more security-sensitive than the others is likely to be an issue someday. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On Tue, Jun 21, 2016 at 11:29 AM, Tom Lane wrote: > Michael Paquier writes: >> On Tue, Jun 21, 2016 at 9:58 AM, Tatsuo Ishii wrote: >>> Even there seems to be ongoing discussions on changing version number >>> while in the beta period (and which definitely requires initdb). Why >>> not changing system catalog during beta?:-) > >> I am not directly against that to be honest, but I'd expect Tom's >> wraith showing up soon on this thread just by saying that. In the two >> last releases, catalog bumps before beta2 because there were no other >> choice. This issue is not really critical, just a stupid miss from me, >> and we can live with this mistake as well. > > Since pg_stat_wal_receiver is new in 9.6, it seems to me that it'd be > wise to try to get it right the first time. And it's not like we are > going to get to beta3 without another initdb --- we already know the > partial-aggregate design is broken and needs some more catalog changes. Amen. That's a sufficient argument to slip this one into 9.6 then. > What I would want to know is whether this specific change is actually a > good idea. In particular, I'm concerned about the possible security > implications of exposing primary_conninfo --- might it not contain a > password, for example? Yes it could, as a connection string, but we make the information of this view only visible to superusers. For the others, that's just NULL. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
Michael Paquier writes: > On Tue, Jun 21, 2016 at 9:58 AM, Tatsuo Ishii wrote: >> Even there seems to be ongoing discussions on changing version number >> while in the beta period (and which definitely requires initdb). Why >> not changing system catalog during beta?:-) > I am not directly against that to be honest, but I'd expect Tom's > wraith showing up soon on this thread just by saying that. In the two > last releases, catalog bumps before beta2 because there were no other > choice. This issue is not really critical, just a stupid miss from me, > and we can live with this mistake as well. Since pg_stat_wal_receiver is new in 9.6, it seems to me that it'd be wise to try to get it right the first time. And it's not like we are going to get to beta3 without another initdb --- we already know the partial-aggregate design is broken and needs some more catalog changes. What I would want to know is whether this specific change is actually a good idea. In particular, I'm concerned about the possible security implications of exposing primary_conninfo --- might it not contain a password, for example? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
Alvaro Herrera writes: > Tom Lane wrote: >> This seems like pretty good evidence that we should remove the "ignored" >> marking for the random test, and maybe remove that functionality from >> pg_regress altogether. We could probably adjust the test to decrease >> its risk-of-failure by another factor of ten or so, if anyone feels like >> 0.005% failure probability is too high. > I suppose that as far as the buildfarm goes it's okay that the test > fails from time to time, but it may be worse from packagers' points of > view, where a randomly failing test can wreck the whole building > process. Is a 0.005% failure probability low enough that nobody will be > bothered by that? As an ex-packager, I think that's a couple orders of magnitude below where anybody will notice it, let alone feel pain. There are other causes of failure that will dwarf this one. (You may recall that I used to bitch regularly about the failure probabilities for mysql's regression tests --- but that was because the probability of failure was on the order of 50%, when building in Red Hat's buildfarm.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0
On Tue, May 24, 2016 at 09:23:27AM -0500, Jim Nasby wrote: > On 5/16/16 2:36 AM, Bruce Momjian wrote: > >Right. I am thinking of writing some docs about how to avoid downtime > >for upgrades of various types. > > If there's some magic sauce to shrink pg_upgrade downtime to near 0 I think > folks would be very interested in that. > > Outside of that scenario, I think what would be far more useful is > information on how to do seamless master/replica switchovers using tools > like pgBouncer or pgPool. That ability is useful *all* the time, not just > when upgrading. It makes it trivial to do OS-level maintenance, and if > you're using a form of logical replication it also makes it trivial to do > expensive database maintenance, such as cluster/vacuum full/reindex. I've > worked with a few clients that had that ability and it was a huge stress > reducer. As a bonus, an unplanned outage of the master becomes far less > stressful, because you already know exactly how to fail over. pg_upgrade's runtime can't be decreased dramatically --- I wanted to document other methods like you described. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0
On Fri, May 20, 2016 at 07:40:53PM -0400, Robert Haas wrote: > On Mon, May 16, 2016 at 3:36 AM, Bruce Momjian wrote: > > On Sun, May 15, 2016 at 03:23:52PM -0500, Jim Nasby wrote: > >> 2) There's no ability at all to revert, other than restore a backup. That > >> means if you pull the trigger and discover some major performance problem, > >> you have no choice but to deal with it (you can't switch back to the old > >> version without losing data). > > > > In --link mode only > > No, not really. Once you let write transactions into the new cluster, > there's no way to get back to the old server version no matter which > option you used. Yes, there is, and it is documented: If you ran pg_upgrade without --link or did not start the new server, the old cluster was not modified except that, if linking started, a .old suffix was appended to $PGDATA/global/pg_control. To reuse the old cluster, possibly remove the .old suffix from $PGDATA/global/pg_control; you can then restart the old cluster. What is confusing you? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On Tue, Jun 21, 2016 at 9:58 AM, Tatsuo Ishii wrote: >>> Thanks, this looks good. Could you please add it to the next commitfest >>> so that it doesn't get lost and also so I can do an official review of it? >> >> Yes, I just did that. That's too late for 9.6 anyway with beta2 close by. > > Even there seems to be ongoing discussions on changing version number > while in the beta period (and which definitely requires initdb). Why > not changing system catalog during beta?:-) I am not directly against that to be honest, but I'd expect Tom's wraith showing up soon on this thread just by saying that. In the two last releases, catalog bumps before beta2 because there were no other choice. This issue is not really critical, just a stupid miss from me, and we can live with this mistake as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
>> Thanks, this looks good. Could you please add it to the next commitfest >> so that it doesn't get lost and also so I can do an official review of it? > > Yes, I just did that. That's too late for 9.6 anyway with beta2 close by. Even there seems to be ongoing discussions on changing version number while in the beta period (and which definitely requires initdb). Why not changing system catalog during beta?:-) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On Mon, Jun 20, 2016 at 11:26 PM, Vik Fearing wrote: > On 19/06/16 13:02, Michael Paquier wrote: >> On Sun, Jun 19, 2016 at 7:38 PM, Vik Fearing wrote: >>> On 19/06/16 12:28, Michael Paquier wrote: On Sun, Jun 19, 2016 at 5:56 PM, Michael Paquier Or in short the attached. >>> >>> The code looks good to me but why no documentation? >> >> Because that's a long day... Thanks! Now you have it. > > Thanks, this looks good. Could you please add it to the next commitfest > so that it doesn't get lost and also so I can do an official review of it? Yes, I just did that. That's too late for 9.6 anyway with beta2 close by. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 5:47 PM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:52 PM, David G. Johnston > wrote: > > Internal or external I do think the number and type of flags described > here, > > for the purposes described, seems undesirable from an architectural > > standpoint. > > Well, that seems like a bold statement to me, because I think that one > really has to understand why flags got added before deciding whether > there are too many of them, and it's not clear to me that you do. > I don't. And I totally accept a response of: you are operating under a false premise here. My response above was more defensive that constructive. What we're talking about here is ONE flag, which currently controls > whether the leader will participate in running Gather's child plan. > What happens when the flag is set to "leader should not participate" > and no workers are available? The current behavior is that the leader > runs the plan in lieu of the workers, and Tom is proposing to change > it so that we wait for a worker to become available instead. I think my biggest (again, I'm not digging deep here) misunderstanding is testing mode versus production mode and what is or is not visible to the test framework compared to SQL/libpq I am usually quite happy when people other than senior hackers weigh > in on threads here, especially about user-visible behavior, because > senior hackers can be quite myopic. We spend most of our time > developing the system rather than using it, and sometimes our notion > of what is useful or acceptable is distorted because of it. Moreover, > none of us are so smart that we can't be wrong, so a gut check from > somebody else is often helpful. > But, of course, an opinion that > proceeds from a false premise doesn't provide useful guidance. This is true - though I'd suspect not absolute. > Many > people realize this, which is why we tend to get entirely too many > opinions on questions like "should we change the version number?" and > far too few on questions like "how should we refactor heap_update?". > > This is a non-sequitur - or I'm just to worn to follow it. I did not intentionally set out to spend an hour of my own time to waste 20 minutes of yours. I won't apologize for an earnest attempt at self-education and contribution; no matter how badly it turned out. I will endeavor to learn from it and avoid similar errors in the future. David J.
Re: [HACKERS] parallel.c is not marked as test covered
Tom Lane wrote: > This seems like pretty good evidence that we should remove the "ignored" > marking for the random test, and maybe remove that functionality from > pg_regress altogether. We could probably adjust the test to decrease > its risk-of-failure by another factor of ten or so, if anyone feels like > 0.005% failure probability is too high. I suppose that as far as the buildfarm goes it's okay that the test fails from time to time, but it may be worse from packagers' points of view, where a randomly failing test can wreck the whole building process. Is a 0.005% failure probability low enough that nobody will be bothered by that? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Fri, Jun 17, 2016 at 3:36 PM, Noah Misch wrote: > I agree the non-atomic, unlogged change is a problem. A related threat > doesn't require a torn page: > > AssignTransactionId() xid=123 > heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, 123); > some ERROR before heap_update() finishes > rollback; -- xid=123 > some backend flushes the modified page > immediate shutdown > AssignTransactionId() xid=123 > commit; -- xid=123 > > If nothing wrote an xlog record that witnesses xid 123, the cluster can > reassign it after recovery. The failed update is now considered a successful > update, and the row improperly becomes dead. That's important. I wonder if that was originally supposed to be handled with the HEAP_XMAX_UNLOGGED flag which was removed in 11919160. A comment in the heap WAL logging commit f2bfe8a2 said that tqual routines would see the HEAP_XMAX_UNLOGGED flag in the event of a crash before logging (though I'm not sure if the tqual routines ever actually did that). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
I wrote: > Depending on what the percentage actually is, maybe we could treat > this like the "random" test, and allow a failure to be disregarded > overall? But that doesn't seem very nice either, in view of our > increasing reliance on automated testing. If "random" were failing > 90% of the time on some buildfarm critters, that would probably > indicate a real problem, but we'd likely not realize it for a long time. BTW, this is getting a bit off-topic for this thread, but: I got curious about whether any such effect might actually exist, and decided to grep the buildfarm logs to see how many times a failure of the "random" test had been ignored. The answer is that there are 35 such occurrences in otherwise-successful buildfarm runs, out of something like 65 successful runs in the buildfarm database. This is in the noise, really. There are more occurrences than that of cases where "random ... failed (ignored)" appeared in a failed run, which more than likely means that some other regression test script caused a crash with collateral damage to this test. This seems like pretty good evidence that we should remove the "ignored" marking for the random test, and maybe remove that functionality from pg_regress altogether. We could probably adjust the test to decrease its risk-of-failure by another factor of ten or so, if anyone feels like 0.005% failure probability is too high. Raw data attached for amusement's sake. regards, tom lane sysname| snapshot | stage | logline ---+-+-+--- orangutan | 2015-01-02 08:00:09 | OK | random ... failed (ignored) dingo | 2015-01-22 17:29:03 | Check | random ... failed (ignored) fulmar| 2015-01-26 03:03:17 | OK | test random ... failed (ignored) nightjar | 2015-02-03 22:27:13 | Check | random ... failed (ignored) (test process exited with exit code 2) dromedary | 2015-03-09 18:57:28 | Check | random ... failed (ignored) (test process exited with exit code 2) olinguito | 2015-03-09 18:59:08 | Check | random ... failed (ignored) (test process exited with exit code 2) rover_firefly | 2015-03-09 19:04:00 | Check | random ... failed (ignored) (test process exited with exit code 2) lapwing | 2015-03-09 19:15:01 | Check | random ... failed (ignored) (test process exited with exit code 2) prairiedog| 2015-03-09 19:20:35 | Check | random ... failed (ignored) (test process exited with exit code 2) dingo | 2015-03-09 19:29:00 | Check | random ... failed (ignored) (test process exited with exit code 2) binturong | 2015-03-09 19:42:28 | Check | random ... failed (ignored) (test process exited with exit code 2) shearwater| 2015-03-09 19:58:15 | Check | random ... failed (ignored) (test process exited with exit code 2) magpie| 2015-04-25 20:59:48 | OK | test random ... failed (ignored) anole | 2015-04-28 21:25:27 | Check | random ... failed (ignored) (test process exited with exit code 2) binturong | 2015-05-07 16:55:15 | pg_upgradeCheck | test random ... failed (ignored) (test process exited with exit code 2) dingo | 2015-05-07 16:58:01 | pg_upgradeCheck | test random ... failed (ignored) (test process exited with exit code 2) binturong | 2015-05-08 18:56:02 | pg_upgradeCheck | test random ... failed (ignored) (test process exited with exit code 2) reindeer | 2015-05-15 06:00:01 | OK | random ... failed (ignored) anole | 2015-05-28 17:58:54 | Check | random ... failed (ignored) (test process exited with exit code 2) anole | 2015-06-01 00:30:15 | Check | random ... failed (ignored) (test process exited with exit code 2) crake | 2015-06-01 18:53:02 | OK | test random ... failed (ignored) sittella | 2015-06-01 22:50:36 | OK | test random ... failed (ignored) anole | 2015-06-01 22:58:24 | Check | random ... failed (ignored) (test process exited with exit code 2) blesbok | 2015-06-27 18:49:43 | Check |
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: > What I'm tempted to do is trying to document that, as a point of > policy, parallel query in 9.6 uses up to (workers + 1) times the > resources that a single session might use. That includes not only CPU > but also things like work_mem and temp file space. This obviously > isn't ideal, but it's what could be done by the ship date. Where would that be documented, though? Would it need to be noted in the case of each such GUC? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 05:11:17PM -0400, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:55 PM, Tom Lane wrote: > > No, the argument for it was that we'd no longer have to have the annual > > discussions about "is it 10.0 yet?". > > WHAT annual argument? Did anyone even argue that any 9.x release > prior to 9.6 deserved to be called 10.0? Maybe somebody suggested > that for 9.2 and it generated, like, four emails? I certainly don't > remember any discussion that remotely approached the amount of time > we've spent litigating both the version number and the version > numbering scheme in the last few months. I do think Robert is 100% accurate on this. Personally, I have never understood the reduce arguments reason, and the jump to 8.0 and 9.0 were done in a positive way that I think provided value to our community. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, May 16, 2016 at 10:16:48AM -0400, Tom Lane wrote: > Peter Eisentraut writes: > > On 5/16/16 9:53 AM, Greg Stark wrote: > >> I thought the idea was that Berkeley tossed an source tree over the > >> wall with no version number and then the first five releases were > >> Postgres95 0.x, Postgres95 1.0, Postgres95 1.0.1, Postgres95 1.0.2, > >> Postgres95 1.0.9. Then the idea was that PostgreSQL 6.0 was the sixth > >> major release counting those as the first five releases. > > > The last release out of Berkeley was 4.2. > > Correct --- I have a copy of that tarball. > > > Then Postgres95 was "5", and then PostgreSQL started at 6. > > I wasn't actually around at the time, but our commit history starts > with this: > > Author: Marc G. Fournier > Branch: master Release: REL6_1 [d31084e9d] 1996-07-09 06:22:35 + > > Postgres95 1.01 Distribution - Virgin Sources > > The first mention of 6.anything is here: > > Author: Bruce Momjian > Branch: master Release: REL6_1 [a2b7f6297] 1996-12-28 02:01:58 + > > Updated changes for 6.0. > > I see no references in the commit history to 5.anything, but there > are some references like this: The sole reason we jumped from Postgres 1.09 to 6.0 was that in Postgres 1.0.X, $PGDATA/PG_VERSION contained '5', meaning when Berkeley went from University Postgres 4.2 to Postgres95 1.0, they didn't reset PG_VERSION. We really had no way of going to Postgres 2.0 unless we were prepared to have data/PG_VERSION never match the major version number. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On 2016-06-20 17:55:19 -0400, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:24 PM, Andres Freund wrote: > > On 2016-06-20 16:10:23 -0400, Robert Haas wrote: > >> What exactly is the point of all of that already_marked stuff? > > > > Preventing the old tuple from being locked/updated by another backend, > > while unlocking the buffer. > > > >> I > >> mean, suppose we just don't do any of that before we go off to do > >> toast_insert_or_update and RelationGetBufferForTuple. Eventually, > >> when we reacquire the page lock, we might find that somebody else has > >> already updated the tuple, but couldn't that be handled by > >> (approximately) looping back up to l2 just as we do in several other > >> cases? > > > > We'd potentially have to undo a fair amount more work: the toasted data > > would have to be deleted and such, just to retry. Which isn't going to > > super easy, because all of it will be happening with the current cid (we > > can't just increase CommandCounterIncrement() for correctness reasons). > > Why would we have to delete the TOAST data? AFAIUI, the tuple points > to the TOAST data, but not the other way around. So if we change our > mind about where to put the tuple, I don't think that requires > re-TOASTing. Consider what happens if we, after restarting at l2, notice that we can't actually insert, but return in the !HeapTupleMayBeUpdated branch. If the caller doesn't error out - and there's certainly callers doing that - we'd "leak" a toasted datum. Unless the transaction aborts, the toasted datum would never be cleaned up, because there's no datum pointing to it, so no heap_delete will ever recurse into the toast datum (via toast_delete()). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Mon, Jun 20, 2016 at 4:24 PM, Andres Freund wrote: > On 2016-06-20 16:10:23 -0400, Robert Haas wrote: >> What exactly is the point of all of that already_marked stuff? > > Preventing the old tuple from being locked/updated by another backend, > while unlocking the buffer. > >> I >> mean, suppose we just don't do any of that before we go off to do >> toast_insert_or_update and RelationGetBufferForTuple. Eventually, >> when we reacquire the page lock, we might find that somebody else has >> already updated the tuple, but couldn't that be handled by >> (approximately) looping back up to l2 just as we do in several other >> cases? > > We'd potentially have to undo a fair amount more work: the toasted data > would have to be deleted and such, just to retry. Which isn't going to > super easy, because all of it will be happening with the current cid (we > can't just increase CommandCounterIncrement() for correctness reasons). Why would we have to delete the TOAST data? AFAIUI, the tuple points to the TOAST data, but not the other way around. So if we change our mind about where to put the tuple, I don't think that requires re-TOASTing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 5:36 PM, David G. Johnston wrote: >> Yeah, no kidding. We had a perfectly good consensus to keep this at >> 9.6 on pgsql-advocacy, and then later we had a revised consensus to >> retitle it to 10.0, > > If -advocacy had changed their mind before beta1 was tagged this may have > played out a bit differently...maybe. In any case 9.6 was a foregone > conclusion given -advocacy's timeline and because of its independence from > -hackers (Tom, the tagger, specifically). One thing to keep in mind is that I did not realize there was any urgency to make the decision before beta1, because Tom had previous made reference to renumbering this version to 10.0 *over the summer* depending on how awesome we then thought parallel query was. I assumed that he would not have made this comment if he had an objection to renumbering the release after beta1. Surely he didn't think we were going to do beta1 in the fall. It was only after beta1 had been released that anyone said post-beta1 was too late. Had that been brought up earlier, the discussion might have gone differently, too. Also, it is not the case that because Tom applies the tag, he also gets veto power over the version numbering scheme. That's not how decision-making in this community works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 4:52 PM, David G. Johnston wrote: > Internal or external I do think the number and type of flags described here, > for the purposes described, seems undesirable from an architectural > standpoint. Well, that seems like a bold statement to me, because I think that one really has to understand why flags got added before deciding whether there are too many of them, and it's not clear to me that you do. What we're talking about here is ONE flag, which currently controls whether the leader will participate in running Gather's child plan. What happens when the flag is set to "leader should not participate" and no workers are available? The current behavior is that the leader runs the plan in lieu of the workers, and Tom is proposing to change it so that we wait for a worker to become available instead. That has the advantage of being better for testing, but the disadvantage of being possibly less useful for other purposes. Neither of us are proposing to eliminate the flag, which would only serve to cripple test coverage, and I will venture to say that neither Tom nor anyone else who has spent much time looking at the way any part of the system works would argue that a Node type with ONE flag has too many flags. I am usually quite happy when people other than senior hackers weigh in on threads here, especially about user-visible behavior, because senior hackers can be quite myopic. We spend most of our time developing the system rather than using it, and sometimes our notion of what is useful or acceptable is distorted because of it. Moreover, none of us are so smart that we can't be wrong, so a gut check from somebody else is often helpful. But, of course, an opinion that proceeds from a false premise doesn't provide useful guidance. Many people realize this, which is why we tend to get entirely too many opinions on questions like "should we change the version number?" and far too few on questions like "how should we refactor heap_update?". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 5:08 PM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:53 PM, Joshua D. Drake > wrote: > > Or we could adopt the very reasonable and practical policy of: > > > > The current versioning scheme isn't broke, so we aren't going to fix it. > > Yeah, no kidding. We had a perfectly good consensus to keep this at > 9.6 on pgsql-advocacy, and then later we had a revised consensus to > retitle it to 10.0, If -advocacy had changed their mind before beta1 was tagged this may have played out a bit differently...maybe. In any case 9.6 was a foregone conclusion given -advocacy's timeline and because of its independence from -hackers (Tom, the tagger, specifically). but as soon as the discussion came over to > pgsql-hackers nothing would do but that we relitigate the whole thing > ignoring the previous discussion because it wasn't on pgsql-hackers. > Why -hackers is the right place to decide on the marketing version > number rather than -advocacy went unexplained, of course. I had the same thought. Yet no one even tried to move the discussion back there. And at this point PGCon was so close that I personally figured it would be discussed. It was, and an announcement was made that a decision was reached. No one voiced an objection so those not at PGCon (or at I) figured for better or worse we're going to version 10 (expected) and to address the expressed desire reduce the public confusion surrounding our major-major-minor version scheme we would instead switch to a more traditional major-minor scheme (acceptable). Now we have > a new consensus, at least the third if not the fourth or fifth, about > what to do on this topic, and since Tom likes this outcome better he'd > like to stop discussion right here. This portion of the thread was mostly a technical concern - how do we display the version in human-readable and machine-readable formats. I'd agree with your earlier idea that if you really want to open up the decision between 10.n.x and 10.x start a new thread on -advocacy. Tally up those at PGCon as a starting point and lets other toss in their lot from there. At this point I'm only seeing rehashing of the same arguments. Let other people make their, informed or otherwise, opinions known if you feel that the group at PGCon is not a fair sampling. > A two-part version numbering > scheme may or may not be for the best, but the idea that we're making > that decision in any principled way, or that the consensus on this new > system is any more valid than any of the previous consensus, doesn't > ring true to me. The idea that this discussion is not fixing any real > problem, though -- that rings true. > > You've hijack the meaning of "this here. Most of "this" thread ended up explaining the difference between "version()" and current_setting('server_version_num"). That needs to be addressed unless we revert back to status-quo. That isn't this thread, though. This thread assumes we are going to 10.0 and semantically losing the middle digit. I'd find it odd to argue against 10.0 on the basis that we'd be displaying 10.0.x in the output of version(). That seems like such a inconsequential detail in the larger scheme of things. Either you accept 10.0 and we pick a human readable output or you don't and the decision doesn't come into play. I'm happy with how things are (+0.5 for we should output 10.0.x for version()) so I'm stopping here. David J.
Re: [HACKERS] 10.0
On 06/20/2016 02:14 PM, Merlin Moncure wrote: On Mon, Jun 20, 2016 at 4:08 PM, Robert Haas wrote: On Mon, Jun 20, 2016 at 4:53 PM, Joshua D. Drake wrote: Or we could adopt the very reasonable and practical policy of: The current versioning scheme isn't broke, so we aren't going to fix it. The idea that this discussion is not fixing any real problem, though -- that rings true. sure -- it's my fault for starting the conversation back up. I was wondering about supporting older version checks, but only because I was unaware of the 'machine' variant of the version check (server_version_num), which properly supports numerical ordering for historical versions. If there's anything to do here, maybe we ought to document that server_version_num should be used for checking version a little more strongly. Judging by google searching, this is as not widely known as it should be. I certainly had no idea it even existed until you displayed the query. I have always used version() but then, I am not a -hacker. Sincerely, JD merlin -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] forcing a rebuild of the visibility map
On Mon, Jun 20, 2016 at 4:57 PM, Tom Lane wrote: > Andres Freund writes: >> I also don't see why it's a good idea to have knowledge about how to >> truncate the visibility map outside of visibilitymap.c. Having that in a >> contrib module just seems like a modularity violation. > > That seems like a pretty good argument. I agree that's a good argument but it passes over one or two points which motivated my approach. Most of the work of truncating the visibility map is, in fact, encapsulated by visibilitymap_truncate, so the only knowledge we're exporting to the contrib module is what WAL record to emit. I put that in the caller of visibilitymap_truncate because the only existing caller of visibilitymap_truncate is RelationTruncate, which also knows what WAL record to emit on behalf of visibilitymap.c. So I don't think I've exported any more knowledge from visibilitymap.c than was the case previously. That having been said, if somebody wants to whack this around, I am not going to put up a big fight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:08 PM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:53 PM, Joshua D. Drake > wrote: >> Or we could adopt the very reasonable and practical policy of: >> >> The current versioning scheme isn't broke, so we aren't going to fix it. > > The idea that this discussion is not fixing any real > problem, though -- that rings true. sure -- it's my fault for starting the conversation back up. I was wondering about supporting older version checks, but only because I was unaware of the 'machine' variant of the version check (server_version_num), which properly supports numerical ordering for historical versions. If there's anything to do here, maybe we ought to document that server_version_num should be used for checking version a little more strongly. Judging by google searching, this is as not widely known as it should be. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:55 PM, Tom Lane wrote: > No, the argument for it was that we'd no longer have to have the annual > discussions about "is it 10.0 yet?". WHAT annual argument? Did anyone even argue that any 9.x release prior to 9.6 deserved to be called 10.0? Maybe somebody suggested that for 9.2 and it generated, like, four emails? I certainly don't remember any discussion that remotely approached the amount of time we've spent litigating both the version number and the version numbering scheme in the last few months. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:53 PM, Joshua D. Drake wrote: > Or we could adopt the very reasonable and practical policy of: > > The current versioning scheme isn't broke, so we aren't going to fix it. Yeah, no kidding. We had a perfectly good consensus to keep this at 9.6 on pgsql-advocacy, and then later we had a revised consensus to retitle it to 10.0, but as soon as the discussion came over to pgsql-hackers nothing would do but that we relitigate the whole thing ignoring the previous discussion because it wasn't on pgsql-hackers. Why -hackers is the right place to decide on the marketing version number rather than -advocacy went unexplained, of course. Now we have a new consensus, at least the third if not the fourth or fifth, about what to do on this topic, and since Tom likes this outcome better he'd like to stop discussion right here. A two-part version numbering scheme may or may not be for the best, but the idea that we're making that decision in any principled way, or that the consensus on this new system is any more valid than any of the previous consensus, doesn't ring true to me. The idea that this discussion is not fixing any real problem, though -- that rings true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] forcing a rebuild of the visibility map
Andres Freund writes: > I also don't see why it's a good idea to have knowledge about how to > truncate the visibility map outside of visibilitymap.c. Having that in a > contrib module just seems like a modularity violation. That seems like a pretty good argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
Alvaro Herrera writes: > Tom Lane wrote: >> If we were going to do it like that, I would argue for "every ten years >> like clockwork", e.g. 10.0.x is next after 9.9.x. But in point of fact, >> Robert, you already made your case for that approach and nobody else >> cared for it. > I voted for this approach initially too, and I think it has merit -- > notably, that it would stop this discussion. It was said that moving > to two-part numbers would stop all discussion, but it seems to have had > exactly the opposite effect. No, the argument for it was that we'd no longer have to have the annual discussions about "is it 10.0 yet?". There was no claim that getting there would be uncontroversial, only that things would be quieter once we did get there. Considering that same long-term viewpoint, I'm not very happy about the idea of "let's reserve the middle digit for compatible feature backports", because the minute we set up a numbering system like that, everybody and his brother will be arguing for making a branch on which to backport their favorite more-or-less-compatible new feature. You as well as others pointed out that we don't have the manpower to actually support any such thing ... so let's not open the door to it. If we do arrive at a consensus that going to simply "10.0, 10.1, etc" would be too much change, then I'd be in favor of adopting the every-ten-year rule instead, in hopes that we can at least shut down future "is it 10.0 yet" threads more quickly. But that really does nothing at all to fix the confusion so often shown by outsiders about the significance of our version numbers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:20 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston >> wrote: >>> 10.x is the desired output. > >> 10.x is the output that some people desire. A significant number of >> people, including me, would prefer to stick with the current >> three-part versioning scheme, possibly with some change to the >> algorithm for bumping the first digit (e.g. every 5 years like >> clockwork). > > If we were going to do it like that, I would argue for "every ten years > like clockwork", e.g. 10.0.x is next after 9.9.x. But in point of fact, > Robert, you already made your case for that approach and nobody else > cared for it. Either there's a meaningful difference between the first > and second parts of the number, or there is not. If there is not, why > have separate parts? It can only cause confusion ... as this whole > thread, and its many many predecessors, amply illustrate. That's not how I remember it. At the Ottawa developer meeting, there were more votes for changing to a two-part versioning scheme than there were for retaining a three-part versioning scheme, but my recollection not overwhelmingly so. I'm pretty sure it was less than a 2/3 majority in favor of changing. Furthermore, essentially everyone in the room, including people who wanted to stick with a three-part scheme, was in favor of the next version's first component being 10. I do not recall there being a strong consensus among the people who wanted the next version to be 10.0.0 on how to decide when to go to 11.0.0, though. Various proposals were offered and most of them got no more than one vote. But saying that nobody other than me thought we should stick with a three-part scheme is revisionist history. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On 06/20/2016 01:41 PM, Alvaro Herrera wrote: Tom Lane wrote: Robert Haas writes: On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston wrote: If we were going to do it like that, I would argue for "every ten years like clockwork", e.g. 10.0.x is next after 9.9.x. But in point of fact, Robert, you already made your case for that approach and nobody else cared for it. I voted for this approach initially too, and I think it has merit -- notably, that it would stop this discussion. It was said that moving to two-part numbers would stop all discussion, but it seems to have had exactly the opposite effect. Or we could adopt the very reasonable and practical policy of: The current versioning scheme isn't broke, so we aren't going to fix it. Put that in the FAQ and wave at it like we do with hints ala Oracle. It is obvious from this thread alone that there is really no consensus. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 4:03 PM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 3:29 PM, David G. Johnston > wrote: > > The entire theory here looks whacked - and seems to fall into the "GUCs > > controlling results" bucket of undesirable things. > > As far as I can see, this entire email is totally wrong and off-base, > because the whole thing seems to be written on the presumption that > single_copy is a GUC, when it's actually a structure member. If there > was some confusion about that, you could have spent 5 seconds running > "git grep" before writing this email, or you could have tried "SET > single_copy" and discovered, hey, there's no such GUC. > > Furthermore, I think that describing something that you obviously > haven't taken any time to understand as "whacked" is not very nice. > For that matter, I think that describing something you *have* taken > time to understand as "whacked" is not very nice. > > Point taken. I don't think my entire post depends solely upon this being a GUC though. I've burned too many brain cells on this already, though, to dive much deeper. Internal or external I do think the number and type of flags described here, for the purposes described, seems undesirable from an architectural standpoint. I do not and cannot offer up more than that generally due to knowledge and resource constraints. I tried to frame things up relative to my understanding of existing, non-parallel, idioms, both to understand it better myself and to throw out another POV from a fresh perspective. I'll admit its one with some drawbacks but its offered in good faith. Please do with it as you will and accept my apology for the poor choice of colloquialism. David J.
Re: [HACKERS] forcing a rebuild of the visibility map
On 2016-06-18 11:56:51 -0400, Tom Lane wrote: > Michael Paquier writes: > > On Sat, Jun 18, 2016 at 6:53 AM, Robert Haas wrote: > >> Andres, do you want to explain the nature of your concern further? > > > I am not in his mind, but my guess is that contrib modules are > > sometimes used as template examples by other people, and encouraging > > users to use those routines in modules would increase the risk to > > misuse them, aka badly-formed records that could corrupt the system. That's not it, no. > If Andres' concern is that XLogInsert isn't a very stable API, maybe > we could address that by providing some wrapper function that knows > how to emit the specific kind of record that pg_visibility needs. That's part of the concern I have, yes. It's pretty common that during minor version updates contrib modules are updated before the main server is restarted. Increasing the coupling on something as critical as WAL logging doesn't strike me as a good idea. I also don't see why it's a good idea to have knowledge about how to truncate the visibility map outside of visibilitymap.c. Having that in a contrib module just seems like a modularity violation. To me this should be a wal_log paramter to visibilitymap_truncate(). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
Tom Lane wrote: > Robert Haas writes: > > On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston > > wrote: > >> 10.x is the desired output. > > > 10.x is the output that some people desire. A significant number of > > people, including me, would prefer to stick with the current > > three-part versioning scheme, possibly with some change to the > > algorithm for bumping the first digit (e.g. every 5 years like > > clockwork). > > If we were going to do it like that, I would argue for "every ten years > like clockwork", e.g. 10.0.x is next after 9.9.x. But in point of fact, > Robert, you already made your case for that approach and nobody else > cared for it. I voted for this approach initially too, and I think it has merit -- notably, that it would stop this discussion. It was said that moving to two-part numbers would stop all discussion, but it seems to have had exactly the opposite effect. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:14 PM, Robert Haas wrote: > On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston > wrote: > > 10.x is the desired output. > > 10.x is the output that some people desire. A significant number of > people, including me, would prefer to stick with the current > three-part versioning scheme, possibly with some change to the > algorithm for bumping the first digit (e.g. every 5 years like > clockwork). > > I was speaking for the project/community as a distinct entity and not about any individual contributor. I'm acting as if we're past the point of individual opinions and votes on the decision to go to a two-part versioning scheme. We will still welcome any major revelations that may have gone unconsidered during the decision making but I find that to be unlikely. David J.
Re: [HACKERS] Reviewing freeze map code
On 2016-06-20 16:10:23 -0400, Robert Haas wrote: > What exactly is the point of all of that already_marked stuff? Preventing the old tuple from being locked/updated by another backend, while unlocking the buffer. > I > mean, suppose we just don't do any of that before we go off to do > toast_insert_or_update and RelationGetBufferForTuple. Eventually, > when we reacquire the page lock, we might find that somebody else has > already updated the tuple, but couldn't that be handled by > (approximately) looping back up to l2 just as we do in several other > cases? We'd potentially have to undo a fair amount more work: the toasted data would have to be deleted and such, just to retry. Which isn't going to super easy, because all of it will be happening with the current cid (we can't just increase CommandCounterIncrement() for correctness reasons). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
Robert Haas writes: > On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston > wrote: >> 10.x is the desired output. > 10.x is the output that some people desire. A significant number of > people, including me, would prefer to stick with the current > three-part versioning scheme, possibly with some change to the > algorithm for bumping the first digit (e.g. every 5 years like > clockwork). If we were going to do it like that, I would argue for "every ten years like clockwork", e.g. 10.0.x is next after 9.9.x. But in point of fact, Robert, you already made your case for that approach and nobody else cared for it. Either there's a meaningful difference between the first and second parts of the number, or there is not. If there is not, why have separate parts? It can only cause confusion ... as this whole thread, and its many many predecessors, amply illustrate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston wrote: > 10.x is the desired output. 10.x is the output that some people desire. A significant number of people, including me, would prefer to stick with the current three-part versioning scheme, possibly with some change to the algorithm for bumping the first digit (e.g. every 5 years like clockwork). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reviewing freeze map code
On Mon, Jun 20, 2016 at 3:33 PM, Andres Freund wrote: >> I'm not sure what to do about this: this part of the heap_update() >> logic has been like this forever, and I assume that if it were easy to >> refactor this away, somebody would have done it by now. > > Well, I think generally nobody seriously looked at actually refactoring > heap_update(), even though that'd be a good idea. But in this instance, > the problem seems relatively fundamental: > > We need to lock the origin page, to do visibility checks, etc. Then we > need to figure out the target page. Even disregarding toasting - which > we could be doing earlier with some refactoring - we're going to have to > release the page level lock, to lock them in ascending order. Otherwise > we'll risk kinda likely deadlocks. We also certainly don't want to nest > the lwlocks for the toast stuff, inside a content lock for the old > tupe's page. > > So far the best idea I have - and it's really not a good one - is to > invent a new hint-bit that tells concurrent updates to acquire a > heavyweight tuple lock, while releasing the page-level lock. If that > hint bit does not require any other modifications - and we don't need an > xid in xmax for this use case - that'll avoid doing all the other > `already_marked` stuff early, which should address the correctness > issue. It's kinda invasive though, and probably has performance > implications. > > Does anybody have a better idea? What exactly is the point of all of that already_marked stuff? I mean, suppose we just don't do any of that before we go off to do toast_insert_or_update and RelationGetBufferForTuple. Eventually, when we reacquire the page lock, we might find that somebody else has already updated the tuple, but couldn't that be handled by (approximately) looping back up to l2 just as we do in several other cases? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 1:00 PM, David G. Johnston > wrote: > > On Mon, Jun 20, 2016 at 3:08 PM, Mark Dilger wrote: > > > Do you have a problem with the human form and machine forms of the version > > number being different in this respect? I don't - for me the decision of a > > choice for the human form is not influenced by the fact the machine form > > has 6 digits (with leading zeros which the human form elides...). > > I don't have a problem with it if humans always use a two part number. I > don't read > the number 14 as being three parts, nor as being two parts, so it doesn't > matter. > What got me to respond this morning was Josh's comment: > > "Realistically, though, we're more likely to end up with 10.0.1 than 10.1." > > He didn't say "11 than 10.1", he said "10.0.1 than 10.1", which showed > that we > already have a confusion waiting to happen. > > Now, you can try to avoid the confusion by saying that we'll always use all > three > digits of the number rather than just two, or always use two digits rather > than three. > But how do you enforce that? > > You do realize he was referring to machine generated output here? No I don't, nor will anyone who finds that via a google search. That's my point. You core hackers feel perfectly comfortable with that because you understand what you are talking about. Hardly anybody else will. As you suggest, that's my $0.02, and I'm moving on. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 3:29 PM, David G. Johnston wrote: > The entire theory here looks whacked - and seems to fall into the "GUCs > controlling results" bucket of undesirable things. As far as I can see, this entire email is totally wrong and off-base, because the whole thing seems to be written on the presumption that single_copy is a GUC, when it's actually a structure member. If there was some confusion about that, you could have spent 5 seconds running "git grep" before writing this email, or you could have tried "SET single_copy" and discovered, hey, there's no such GUC. Furthermore, I think that describing something that you obviously haven't taken any time to understand as "whacked" is not very nice. For that matter, I think that describing something you *have* taken time to understand as "whacked" is not very nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 3:08 PM, Mark Dilger wrote: > > > Do you have a problem with the human form and machine forms of the > version number being different in this respect? I don't - for me the > decision of a choice for the human form is not influenced by the fact the > machine form has 6 digits (with leading zeros which the human form > elides...). > > I don't have a problem with it if humans always use a two part number. I > don't read > the number 14 as being three parts, nor as being two parts, so it > doesn't matter. > What got me to respond this morning was Josh's comment: > > "Realistically, though, we're more likely to end up with 10.0.1 than 10.1." > > He didn't say "11 than 10.1", he said "10.0.1 than 10.1", which showed > that we > already have a confusion waiting to happen. > > Now, you can try to avoid the confusion by saying that we'll always use > all three > digits of the number rather than just two, or always use two digits rather > than three. > But how do you enforce that? You do realize he was referring to machine generated output here? That means unless we get careless we have full control over what is output. Enforcement is easy. If in some sense the middle number exists, but is > always zero, then some people will mention it and some won't, with the > result that > 10.0.x and 10.x will be used by different people at different times to > refer to the same > release. > Back to human generated output again... > > The problem that Tom and others mentioned upthread (rather a while ago) is > that > up until now, we've had three digits in the version numbers, but the first > two numbers > really just mean one thing, "major", and the last number means "minor". > IIUC, he > wanted to stop using two separate digits where clearly just one would > suffice. Hence > the proposal to go to things like 10.0, 10.1. But others chimed in saying > that we need > to keep it three parts for compatibility reasons, so how about we just > have the middle > number always be zero. My response to that is what I've just said. You > can't do that > without creating ambiguity in future version numbers, because of how > people use > language. If you want to avoid the ambiguity and confusion going forward, > all the > numbers in the scheme must have meaning and not be mere placeholders. I > gave a > suggestion about what the middle number *could* mean, which I don't deny > is what > I'd like best, but it's just a suggestion to avoid the confusion inherent > in people eliding > the middle number sometimes but not other times. > > Given everything else you wrote I find it hard to believe you actually, deep down, think that "definitional meaning " is sufficient here. The reason informed people say "9.2.x" and "9.3.x" is that those are in fact t w o different things where the middle number changes every year. If we go right from: 10.0.7 to 11.0.0 there will be no added value in distinguishing between 10.0.x and 10.x and so people will no t do it - and and you said there is no way to force them. That the middle number could, in well defined circumstances, change is immaterial when it will not do so in practice. So, we can ensure the machine output is consistent. Th at is easy. The only question here is whether we make the machine human-readable output better conform to expected human-generate usage (i.e., 10.x) or do we keep the machine generated human-readable output compatible with the existing format so that machine processing of the string is unaffected. The only way to address your concern is to continue with the status-quo. Make 10.1.x and 10.2.x and so people cannot meaningfully drop the second digit. Since it has been accepted that the status quo has its own problems - namely that the less informed population are already treating our version number as if it is 9.x (instead of 9.5.x) that change to better conform to societal expectations is desirable. If you're going to vote for status-quo I'd say add your $0.02 and move on; I don't think that is still on the table (unless some real bad reality smacks us in the face. The human generated dynamic is thus a foregone conclusion - the majority are going to call and write our next release as 10.x. The machine readable output will be 10. The human readable output seems to be up for a vote. 10.0.x or 10.x 10.x is the desired output. Having a list of actual problems likely to face our users if we do this would be helpful. Identifying exactly where this human-readable output string appears would be helpful. Obviously the "version()" command but where else? I do think Josh has the right of it, though. Humans seeing 10.0.4 in the version() output will easily adapt once they understand why we left it that way. Machines cannot adapt as readily. The fact that we advertise 10.4 while our version number is 10.0.4 in one small corner of out system is a minor irritant compared to the potential for
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 1:13 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane wrote: >>> Personally, I'm +1 for such tinkering if it makes the feature either more >>> controllable or more understandable. After reading the comments at the >>> head of nodeGather.c, though, I don't think that single_copy is either >>> understandable or useful, and merely renaming it won't help. Apparently, >>> it runs code in the worker, except when it doesn't, and even when it does, >>> it's absolutely guaranteed to be a performance loss because the leader is >>> doing nothing. What in the world is the point? > >> The single_copy flag allows a Gather node to have a child plan which >> is not intrinsically parallel. > > OK, but I'm very dubious of your claim that this has any use except for > testing purposes. It certainly has no such use today. Therefore, the > behavior of falling back to running in the leader seems like an > anti-feature to me. If we want to test running in a worker, then we > want to test that, not maybe test it. > > I propose that the behavior we actually want here is to execute in > a worker, full stop. If we can't get one, wait till we can. If we > can't get one because no slots have been allocated at all, fail. > That would make the behavior deterministic enough for the regression > tests to rely on it. I agree that for force_parallel_mode testing, that behavior would be useful. I am also pretty confident we're going to want the behavior where the leader runs the plan if, and only if, no workers can be obtained for other purposes. However, I agree that's not essential today. > And while I don't know what this mode should be called, I'm pretty sure > that neither "single_copy" nor "pipeline" are useful descriptions. Maybe we should make this an enum rather than a boolean, since there seem to be more than two useful behaviors. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 3:07 PM, Gražvydas Valeika wrote: > Hi, > > I recently bumped into http://semver.org/ > > It can be interesting to participants of this discussion. > > Especially motivation for minor version (middle number). > > While we appreciate the comment this is third (maybe forth) time this has come up during this discussion - the last being 20 emails and 3 days ago. In short, we don't and never will be semver compliant so if anything its numbering protocol is something to avoid, not embrace. David J.
Re: [HACKERS] Reviewing freeze map code
On 2016-06-15 08:56:52 -0400, Robert Haas wrote: > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2, > and CTID without logging anything or clearing the all-visible flag and > then releases the lock on the heap page to go do some more work that > might even ERROR out. Only if that other work all goes OK do we > relock the page and perform the WAL-logged actions. > > That doesn't seem like a good idea even in existing releases, because > you've taken a tuple on an all-visible page and made it not > all-visible, and you've made a page modification that is not > necessarily atomic without logging it. Right, that's broken. > I'm not sure what to do about this: this part of the heap_update() > logic has been like this forever, and I assume that if it were easy to > refactor this away, somebody would have done it by now. Well, I think generally nobody seriously looked at actually refactoring heap_update(), even though that'd be a good idea. But in this instance, the problem seems relatively fundamental: We need to lock the origin page, to do visibility checks, etc. Then we need to figure out the target page. Even disregarding toasting - which we could be doing earlier with some refactoring - we're going to have to release the page level lock, to lock them in ascending order. Otherwise we'll risk kinda likely deadlocks. We also certainly don't want to nest the lwlocks for the toast stuff, inside a content lock for the old tupe's page. So far the best idea I have - and it's really not a good one - is to invent a new hint-bit that tells concurrent updates to acquire a heavyweight tuple lock, while releasing the page-level lock. If that hint bit does not require any other modifications - and we don't need an xid in xmax for this use case - that'll avoid doing all the other `already_marked` stuff early, which should address the correctness issue. It's kinda invasive though, and probably has performance implications. Does anybody have a better idea? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 12:06 PM, Robert Haas wrote: > On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane wrote: > >> although I fear we > >> might be getting to a level of tinkering with parallel query that > >> starts to look more like feature development. > > > > Personally, I'm +1 for such tinkering if it makes the feature either more > > controllable or more understandable. After reading the comments at the > > head of nodeGather.c, though, I don't think that single_copy is either > > understandable or useful, and merely renaming it won't help. Apparently, > > it runs code in the worker, except when it doesn't, and even when it > does, > > it's absolutely guaranteed to be a performance loss because the leader is > > doing nothing. What in the world is the point? > > The single_copy flag allows a Gather node to have a child plan which > is not intrinsically parallel. For example, consider these two plans: > > Gather > -> Parallel Seq Scan > > Gather > -> Seq Scan > > The first plan is safe regardless of the setting of the single-copy > flag. If the plan is executed in every worker, the results in > aggregate across all workers will add up to the results of a > non-parallel sequential scan of the table. The second plan is safe > only if the # of workers is 1 and the single-copy flag is set. If > either of those things is not true, then more than one process might > try to execute the sequential scan, and the result will be that you'll > get N copies of the output, where N = (# of parallel workers) + > (leader also participates ? 1 : 0). > > For force_parallel_mode = {on, regress}, the single-copy behavior is > essential. We can run all of those plans inside a worker, but only > because we know that the leader won't also try to run those same > plans. > > The entire theory here looks whacked - and seems to fall into the "GUCs controlling results" bucket of undesirable things. Is this GUC enabled by a compile time directive, or otherwise protected from misuse in production? I'm having trouble sounding smart here about what is bothering me but basically the parallel infrastructure (i.e., additional workers) shouldn't even be used for "Seq Scan" and a "Seq Scan" under a Gather should behave no differently than a "Parallel Seq Scan" under a Gather where all work is done by the leader because no workers were available to help. At worse this behavior should be an implementation artifact of force_parallel_mode={on,regress}; at best the Gather node would simply have this intelligence on, always, so as not to silently generate bogus results in a mis-configured or buggy setup. > [...] > > > Actually, though, the behavior I really want the single_copy flag to > embody is not so much "only one process runs this" but "leader does > not participate unless there are no workers", which is the same thing > only when the budgeted number of workers is one. This sounds an awful lot like a planner hint; especially since it defaults to off. This is useful > because of plans like this: > > Finalize HashAggregate > -> Gather > -> Partial HashAggregate > -> Hash Join >-> Parallel Seq Scan on large_table >-> Hash > -> Seq Scan on another_large_table > > Unless the # of groups is very small, the leader actually won't > perform very much of the parallel-seq-scan on large_table, because > it'll be too busy aggregating the results from the other workers. > However, if it ever reaches a point where the Gather can't read a > tuple from one of the workers immediately, which is almost certain to > occur right at the beginning of execution, it's going to go build a > copy of the hash table so that it can "help" with the hash join. By > the time it finishes, the workers will have done the same and be > feeding it results, and it will likely get little use out of the copy > that it built itself. But it will still have gone to the effort of > building it. > > For 10.0, Thomas Munro has already done a bunch of work, and will be > doing more work, so that we can build a shared hash table, rather than > one copy per worker. That's going to be better when the table is > large anyway, so maybe this particular case won't matter so much. But > in general when a partial path has a substantial startup cost, it may > be better for the leader not to get involved. So have the Gather node understand this and act accordingly. This is also quite different than the "we'll get wrong results" problem described above but which this GUC also attempts to solve. I'm inclined to believe three things: 1) We need a test mode whereby we guarantee at least one worker is used for processing. 2) Gather needs to be inherently smart enough to accept data from a non-parallel source. 3) Gather needs to use its knowledge (hopefully it has some) of partial plan startup costs and worker availability to decide whether it wants to participate in both hunting and gathering. It should make this decision once at startup a
Re: [HACKERS] [sqlsmith] OOM crash in plpgsql_extra_checks_check_hook
Andreas Seltenreich writes: > Just had a parallel worker of a memory-starved instance of sqlsmith > crash. plpgsql_extra_checks_check_hook forgot to check the result of > its malloc call here: Good catch, will fix. Thanks! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 11:30 AM, David G. Johnston > wrote: > > On Mon, Jun 20, 2016 at 2:14 PM, Mark Dilger wrote: > > > On Jun 20, 2016, at 11:06 AM, Joshua D. Drake > > wrote: > > > > On 06/20/2016 10:45 AM, Mark Dilger wrote: > > > >>> Now maybe you have some other idea in mind, but I don't quite > >>> understand what it is. > >> > >> I do, indeed, and here it is: > >> > >> When considering whether to *back port* a change, we typically do so > >> on the basis that bug fixes are back ported, features are not. In my > >> proposed versioning scheme, you could back port any feature that is > >> compatible with the old version, and bump the middle number to alert > >> users that you've not just back ported a bug fix, but a feature. Any new > >> features that are not compatible don't get back ported. > >> > >> If you fix a bug on master during development, you can back port that > >> as well. But instead of bumping the middle number, you bump the last > >> number. > >> > >> Somebody running a version that is three major versions back could > >> still get the advantage of new features, so long as those new features > >> are not incompatible. It's frequently not nearly so easy to run pg_upgrade > >> as it is to run rpm -U. You get downtime either way, but the elapsed > >> time of that downtime is orders of magnitude different. > >> > >> Someone else running that same version from three major versions ago > >> can take a more conservative policy, and only upgrade bug fixes, and > >> forego the back ported features. > >> > >> You still have one major version release per year. At that time, you can > >> also release back-port versions of prior major versions. > > > > OFFLIST: > > > > You are fighting a losing if noble battle. > > I think all my emails on this subject have been seriously misunderstood. > Perhaps the problem is that I don't understand some critical issue. Can > you please help me understand this part: > > It seems like people want releases, starting with 10.0, to be named things > like 10.0, 10.1, 10.2,..., but for the purposes of communicating with programs > like nagios refer to them as 10.0.0, 10.0.1, 10.0.2 > > Is that right? > > That's the part that really annoys me, and I keep trying to argue for not > doing > that, and people keep replying to other parts of my messages rather than > replying to the core part of what I am saying. > > Why would any sensible person want a release to sometimes be called > "10.4", but the exact same release sometimes referred to as "10.0.4"? > This is just going to confuse average software users, as they would never > expect that 10.4 and 10.0.4 are the same thing. > > Have I misunderstood what is being proposed? > > The software is only ever going to report 10.0.4 OR 10.4. The area of > concern is that people are going to get annoyed at saying: "that was fixed in > 10.0.4" and so mailing lists and other forums where people write the numbers > instead of a computer are going to be littered with: "that was fixed in 10.4". > > So now human speak and machine speak are disjointed. > > The machine form output for that release is going to be 14 regardless of > the decision to make the human form 10.4 or 10.0.4 > > Do you have a problem with the human form and machine forms of the version > number being different in this respect? I don't - for me the decision of a > choice for the human form is not influenced by the fact the machine form has > 6 digits (with leading zeros which the human form elides...). I don't have a problem with it if humans always use a two part number. I don't read the number 14 as being three parts, nor as being two parts, so it doesn't matter. What got me to respond this morning was Josh's comment: "Realistically, though, we're more likely to end up with 10.0.1 than 10.1." He didn't say "11 than 10.1", he said "10.0.1 than 10.1", which showed that we already have a confusion waiting to happen. Now, you can try to avoid the confusion by saying that we'll always use all three digits of the number rather than just two, or always use two digits rather than three. But how do you enforce that? If in some sense the middle number exists, but is always zero, then some people will mention it and some won't, with the result that 10.0.x and 10.x will be used by different people at different times to refer to the same release. The problem that Tom and others mentioned upthread (rather a while ago) is that up until now, we've had three digits in the version numbers, but the first two numbers really just mean one thing, "major", and the last number means "minor". IIUC, he wanted to stop using two separate digits where clearly just one would suffice. Hence the proposal to go to things like 10.0, 10.1. But others chimed in saying that we need to keep it three parts for compatibility reasons, so how about we just have the middle number always be zero. My response to that is what I've just sai
[HACKERS] [sqlsmith] OOM crash in plpgsql_extra_checks_check_hook
Just had a parallel worker of a memory-starved instance of sqlsmith crash. plpgsql_extra_checks_check_hook forgot to check the result of its malloc call here: Core was generated by `postgres: bgworker: parallel worker for PID 5905 '. Program terminated with signal SIGSEGV, Segmentation fault. #0 plpgsql_extra_checks_check_hook (newvalue=, extra=0x7fff7fe31a58, source=) at pl_handler.c:113 113 *myextra = extrachecks; (gdb) bt #0 plpgsql_extra_checks_check_hook (newvalue=, extra=0x7fff7fe31a58, source=) at pl_handler.c:113 #1 0x0080173f in call_string_check_hook (newval=0x7fff7fe31a50, extra=, source=, elevel=15, conf=, conf=) at guc.c:9779 #2 0x008029b8 in InitializeOneGUCOption (gconf=0x4) at guc.c:4546 #3 0x00804dbc in define_custom_variable (variable=0x2cb6ef0) at guc.c:7466 #4 0x00805862 in DefineCustomStringVariable (name=name@entry=0x7f803cbfe011 "plpgsql.extra_warnings", short_desc=short_desc@entry=0x7f803cbfe1f8 "List of programming constructs that should produce a warning.", long_desc=long_desc@entry=0x0, valueAddr=valueAddr@entry=0x7f803ce070d8 , bootValue=bootValue@entry=0x7f803cbfdf78 "none", context=context@entry=PGC_USERSET, flags=1, check_hook=0x7f803cbe9700 , assign_hook=0x7f803cbe96e0 , show_hook=0x0) at guc.c:7733 #5 0x7f803cbe99ea in _PG_init () at pl_handler.c:173 #6 0x007f1bcb in internal_load_library (libname=libname@entry=0x7f8040cee14d ) at dfmgr.c:276 #7 0x007f2738 in RestoreLibraryState (start_address=0x7f8040cee14d ) at dfmgr.c:741 #8 0x004e61c0 in ParallelWorkerMain (main_arg=) at parallel.c:985 #9 0x00684072 in StartBackgroundWorker () at bgworker.c:726 #10 0x0068f142 in do_start_bgworker (rw=0x2cb5230) at postmaster.c:5535 #11 maybe_start_bgworker () at postmaster.c:5709 #12 0x0068fb96 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4971 #13 #14 0x7f8040091ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #15 0x0046c31f in ServerLoop () at postmaster.c:1657 #16 0x00690fc7 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x2c8c620) at postmaster.c:1301 #17 0x0046d96d in main (argc=4, argv=0x2c8c620) at main.c:228 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
Hi, I recently bumped into http://semver.org/ It can be interesting to participants of this discussion. Especially motivation for minor version (middle number). Best, Grazvydas On Mon, Jun 20, 2016 at 9:30 PM, David G. Johnston < david.g.johns...@gmail.com> wrote: > On Mon, Jun 20, 2016 at 2:14 PM, Mark Dilger > wrote: > >> >> > On Jun 20, 2016, at 11:06 AM, Joshua D. Drake >> wrote: >> > >> > On 06/20/2016 10:45 AM, Mark Dilger wrote: >> > >> >>> Now maybe you have some other idea in mind, but I don't quite >> >>> understand what it is. >> >> >> >> I do, indeed, and here it is: >> >> >> >> When considering whether to *back port* a change, we typically do so >> >> on the basis that bug fixes are back ported, features are not. In my >> >> proposed versioning scheme, you could back port any feature that is >> >> compatible with the old version, and bump the middle number to alert >> >> users that you've not just back ported a bug fix, but a feature. Any >> new >> >> features that are not compatible don't get back ported. >> >> >> >> If you fix a bug on master during development, you can back port that >> >> as well. But instead of bumping the middle number, you bump the last >> >> number. >> >> >> >> Somebody running a version that is three major versions back could >> >> still get the advantage of new features, so long as those new features >> >> are not incompatible. It's frequently not nearly so easy to run >> pg_upgrade >> >> as it is to run rpm -U. You get downtime either way, but the elapsed >> >> time of that downtime is orders of magnitude different. >> >> >> >> Someone else running that same version from three major versions ago >> >> can take a more conservative policy, and only upgrade bug fixes, and >> >> forego the back ported features. >> >> >> >> You still have one major version release per year. At that time, you >> can >> >> also release back-port versions of prior major versions. >> > >> > OFFLIST: >> > >> > You are fighting a losing if noble battle. >> >> I think all my emails on this subject have been seriously misunderstood. >> Perhaps the problem is that I don't understand some critical issue. Can >> you please help me understand this part: >> >> It seems like people want releases, starting with 10.0, to be named things >> like 10.0, 10.1, 10.2,..., but for the purposes of communicating with >> programs >> like nagios refer to them as 10.0.0, 10.0.1, 10.0.2 >> >> Is that right? >> >> That's the part that really annoys me, and I keep trying to argue for not >> doing >> that, and people keep replying to other parts of my messages rather than >> replying to the core part of what I am saying. >> >> Why would any sensible person want a release to sometimes be called >> "10.4", but the exact same release sometimes referred to as "10.0.4"? >> This is just going to confuse average software users, as they would never >> expect that 10.4 and 10.0.4 are the same thing. >> >> Have I misunderstood what is being proposed? >> > > The software is only ever going to report 10.0.4 OR 10.4. The area of > concern is that people are going to get annoyed at saying: "that was fixed > in 10.0.4" and so mailing lists and other forums where people write the > numbers instead of a computer are going to be littered with: "that was > fixed in 10.4". > > So now human speak and machine speak are disjointed. > > The machine form output for that release is going to be 14 regardless > of the decision to make the human form 10.4 or 10.0.4 > > Do you have a problem with the human form and machine forms of the version > number being different in this respect? I don't - for me the decision of a > choice for the human form is not influenced by the fact the machine form > has 6 digits (with leading zeros which the human form elides...). > > This thread started with complaint that people are parsing our human form > output instead of the machine form. The OP later admitted that he didn't > realize that a machine form was so readily available. That is worry-some, > since I doubt that is an isolated incident, and leads to the discussion of > what form the human intended version should take. > > David J. > >
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On 19/06/16 13:02, Michael Paquier wrote: > On Sun, Jun 19, 2016 at 7:38 PM, Vik Fearing wrote: >> On 19/06/16 12:28, Michael Paquier wrote: >>> On Sun, Jun 19, 2016 at 5:56 PM, Michael Paquier >>> Or in short the attached. >> >> The code looks good to me but why no documentation? > > Because that's a long day... Thanks! Now you have it. Quick bikeshed: I think the column should be called conninfo and not conn_info to match other places it's used. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 2:14 PM, Mark Dilger wrote: > > > On Jun 20, 2016, at 11:06 AM, Joshua D. Drake > wrote: > > > > On 06/20/2016 10:45 AM, Mark Dilger wrote: > > > >>> Now maybe you have some other idea in mind, but I don't quite > >>> understand what it is. > >> > >> I do, indeed, and here it is: > >> > >> When considering whether to *back port* a change, we typically do so > >> on the basis that bug fixes are back ported, features are not. In my > >> proposed versioning scheme, you could back port any feature that is > >> compatible with the old version, and bump the middle number to alert > >> users that you've not just back ported a bug fix, but a feature. Any > new > >> features that are not compatible don't get back ported. > >> > >> If you fix a bug on master during development, you can back port that > >> as well. But instead of bumping the middle number, you bump the last > >> number. > >> > >> Somebody running a version that is three major versions back could > >> still get the advantage of new features, so long as those new features > >> are not incompatible. It's frequently not nearly so easy to run > pg_upgrade > >> as it is to run rpm -U. You get downtime either way, but the elapsed > >> time of that downtime is orders of magnitude different. > >> > >> Someone else running that same version from three major versions ago > >> can take a more conservative policy, and only upgrade bug fixes, and > >> forego the back ported features. > >> > >> You still have one major version release per year. At that time, you > can > >> also release back-port versions of prior major versions. > > > > OFFLIST: > > > > You are fighting a losing if noble battle. > > I think all my emails on this subject have been seriously misunderstood. > Perhaps the problem is that I don't understand some critical issue. Can > you please help me understand this part: > > It seems like people want releases, starting with 10.0, to be named things > like 10.0, 10.1, 10.2,..., but for the purposes of communicating with > programs > like nagios refer to them as 10.0.0, 10.0.1, 10.0.2 > > Is that right? > > That's the part that really annoys me, and I keep trying to argue for not > doing > that, and people keep replying to other parts of my messages rather than > replying to the core part of what I am saying. > > Why would any sensible person want a release to sometimes be called > "10.4", but the exact same release sometimes referred to as "10.0.4"? > This is just going to confuse average software users, as they would never > expect that 10.4 and 10.0.4 are the same thing. > > Have I misunderstood what is being proposed? > The software is only ever going to report 10.0.4 OR 10.4. The area of concern is that people are going to get annoyed at saying: "that was fixed in 10.0.4" and so mailing lists and other forums where people write the numbers instead of a computer are going to be littered with: "that was fixed in 10.4". So now human speak and machine speak are disjointed. The machine form output for that release is going to be 14 regardless of the decision to make the human form 10.4 or 10.0.4 Do you have a problem with the human form and machine forms of the version number being different in this respect? I don't - for me the decision of a choice for the human form is not influenced by the fact the machine form has 6 digits (with leading zeros which the human form elides...). This thread started with complaint that people are parsing our human form output instead of the machine form. The OP later admitted that he didn't realize that a machine form was so readily available. That is worry-some, since I doubt that is an isolated incident, and leads to the discussion of what form the human intended version should take. David J.
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 1:48 PM, Mark Dilger wrote: > > > On Jun 20, 2016, at 9:38 AM, David G. Johnston < > david.g.johns...@gmail.com> wrote: > > > > On Mon, Jun 20, 2016 at 12:28 PM, Mark Dilger > wrote: > > > > > On Jun 20, 2016, at 8:53 AM, Mark Dilger > wrote: > > > > > > > > > This is not a plea for keeping the three part versioning system. It's > just > > > a plea not to have a 2 part versioning system masquerading as a three > > > part versioning system, or vice versa. > > > > To clarify my concern, I never want to have to write code like this: > > > > CASE WHEN pg_version eq '11.1' OR pg_version eq '11.0.1' THEN > foo() > >WHEN pg_version eq '11.2' OR pg_version eq '11.0.2' > THEN bar() > > > > or > > > > if (0 == strcmp(pg_version_string, "11.1") || 0 == > strcmp(pg_version_string, "11.0.1")) > > foo(); > > else if (0 == strcmp(pg_version_string, "11.2") || 0 == > strcmp(pg_version_string, "11.0.2")) > > bar(); > > > > either in sql, perl, c, java, or anywhere else. As soon as you have two > different > > formats for the version string, you get into this hell. Yeah, ok, you > may have > > a sql level function for this, but I'm thinking about applications > somewhat removed > > from a direct connection to the database, where you can't be sure which > format > > you'll be handed. > > > > Now you argue for keeping the middle number on pure compatibility > grounds... > > I am not arguing for that. I'm arguing against having two different > versions of the > same thing. If you go with a two part versioning scheme that is sometimes > written as a > three part versioning scheme, you make every bit of code that deals with > it from now on > have to deal with both possible versions in order to be robust. > > My preference is to have a three part versioning scheme where all three > parts have > different purposes. But since the community seems to have no interest in > that, and > have largely (if not universally) rejected that idea, I'm falling back to > merely arguing > that if we're going to have a two part versioning system, we should do > that everywhere, > and if we can't do that everywhere, then we should do that nowhere. > > You appear to be arguing that we should have a versioning scheme that is > sometimes > two parts, and sometimes three parts, but when three parts, always make > the middle > number zero. The part of that which bothers me most is not the "always > zero" part. It's > the "sometimes two parts, sometimes three parts" part > The machine representation of the version number is 6 digits without any punctuation. The human representation for version numbers >= 10 is two integers separated using a full-stop/period/decimal "." You would never write 10.0.2 instead of 10.2 just like you'd never write 1002 instead of 12 You only have to do more than one thing if you are using the wrong representation and are trying to be robust. While I understand that some people may indeed be doing that or otherwise stuck in that unfortunate circumstance accommodating their needs amounts to nothing other than maintaining compatibility for unsupported usage. I'll admit we probably could have been more explicit on what we do consider proper usage - which is why I have at least some sympathy for the position. David J.
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 11:06 AM, Joshua D. Drake wrote: > > On 06/20/2016 10:45 AM, Mark Dilger wrote: > >>> Now maybe you have some other idea in mind, but I don't quite >>> understand what it is. >> >> I do, indeed, and here it is: >> >> When considering whether to *back port* a change, we typically do so >> on the basis that bug fixes are back ported, features are not. In my >> proposed versioning scheme, you could back port any feature that is >> compatible with the old version, and bump the middle number to alert >> users that you've not just back ported a bug fix, but a feature. Any new >> features that are not compatible don't get back ported. >> >> If you fix a bug on master during development, you can back port that >> as well. But instead of bumping the middle number, you bump the last >> number. >> >> Somebody running a version that is three major versions back could >> still get the advantage of new features, so long as those new features >> are not incompatible. It's frequently not nearly so easy to run pg_upgrade >> as it is to run rpm -U. You get downtime either way, but the elapsed >> time of that downtime is orders of magnitude different. >> >> Someone else running that same version from three major versions ago >> can take a more conservative policy, and only upgrade bug fixes, and >> forego the back ported features. >> >> You still have one major version release per year. At that time, you can >> also release back-port versions of prior major versions. > > OFFLIST: > > You are fighting a losing if noble battle. I think all my emails on this subject have been seriously misunderstood. Perhaps the problem is that I don't understand some critical issue. Can you please help me understand this part: It seems like people want releases, starting with 10.0, to be named things like 10.0, 10.1, 10.2,..., but for the purposes of communicating with programs like nagios refer to them as 10.0.0, 10.0.1, 10.0.2 Is that right? That's the part that really annoys me, and I keep trying to argue for not doing that, and people keep replying to other parts of my messages rather than replying to the core part of what I am saying. Why would any sensible person want a release to sometimes be called "10.4", but the exact same release sometimes referred to as "10.0.4"? This is just going to confuse average software users, as they would never expect that 10.4 and 10.0.4 are the same thing. Have I misunderstood what is being proposed? The only reason I talk about using the middle number for this other purpose is to forestall people assuming that they can elide the middle number of a three number system, such that it gets elided sometimes but not other times. I was quite clear in my email this morning that I'm not arguing for a three number system. I'm perfectly ok with a two number system. I just don't want a "let's confuse everybody by making each and every release have two different names" system. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
Mark Dilger wrote: > When considering whether to *back port* a change, we typically do so > on the basis that bug fixes are back ported, features are not. In my > proposed versioning scheme, you could back port any feature that is > compatible with the old version, and bump the middle number to alert > users that you've not just back ported a bug fix, but a feature. Any new > features that are not compatible don't get back ported. So instead of having about 5 branches to maintain, we will end up with 5N branches, because we will still have to support the initial .0 branches of each major, plus .1, .2 ... of each major branch? Granted, we could have major releases not every year but perhaps every 2 or 3 years; non-catversion-bumping patches would still be released quicker than that, in middle-number-increasing releases. But like Robert, I don't think that the number of features we could add this way would be numerous enough to support this idea in the long run. I think this increases the load on committers many times. Count me out. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 9:38 AM, David G. Johnston > wrote: > > On Mon, Jun 20, 2016 at 12:28 PM, Mark Dilger wrote: > > > On Jun 20, 2016, at 8:53 AM, Mark Dilger wrote: > > > > > > This is not a plea for keeping the three part versioning system. It's just > > a plea not to have a 2 part versioning system masquerading as a three > > part versioning system, or vice versa. > > To clarify my concern, I never want to have to write code like this: > > CASE WHEN pg_version eq '11.1' OR pg_version eq '11.0.1' THEN foo() >WHEN pg_version eq '11.2' OR pg_version eq '11.0.2' THEN > bar() > > or > > if (0 == strcmp(pg_version_string, "11.1") || 0 == > strcmp(pg_version_string, "11.0.1")) > foo(); > else if (0 == strcmp(pg_version_string, "11.2") || 0 == > strcmp(pg_version_string, "11.0.2")) > bar(); > > either in sql, perl, c, java, or anywhere else. As soon as you have two > different > formats for the version string, you get into this hell. Yeah, ok, you may > have > a sql level function for this, but I'm thinking about applications somewhat > removed > from a direct connection to the database, where you can't be sure which format > you'll be handed. > > Now you argue for keeping the middle number on pure compatibility grounds... I am not arguing for that. I'm arguing against having two different versions of the same thing. If you go with a two part versioning scheme that is sometimes written as a three part versioning scheme, you make every bit of code that deals with it from now on have to deal with both possible versions in order to be robust. My preference is to have a three part versioning scheme where all three parts have different purposes. But since the community seems to have no interest in that, and have largely (if not universally) rejected that idea, I'm falling back to merely arguing that if we're going to have a two part versioning system, we should do that everywhere, and if we can't do that everywhere, then we should do that nowhere. You appear to be arguing that we should have a versioning scheme that is sometimes two parts, and sometimes three parts, but when three parts, always make the middle number zero. The part of that which bothers me most is not the "always zero" part. It's the "sometimes two parts, sometimes three parts" part. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 9:43 AM, Robert Haas wrote: > > On Mon, Jun 20, 2016 at 12:26 PM, Mark Dilger wrote: >>> In practical effect that is exactly what your proposal does. You just feel >>> better because you defined when B is allowed to change even though it never >>> should happen based upon our project policy. And any rare exception can >>> justifiably be called a bug fix because, face it, it would only happen if >>> someone reports a bug. >> >> Why are you refusing to acknowledge the difference between features that >> require a pg_upgrade and features that don't? > > The amount of additional committer work that would be created by > making that distinction would be large. Currently, we're on an annual > release cycle. Every commit that's not a bug fix gets committed to > exactly one branch: master. Inevitably, there are multiple changes > per cycle - dozens, probably - that change initial catalog contents > and would therefore require pg_upgrade. > > Suppose we switched to a semi-annual release cycle where every other > release required a pg_upgrade, so once per year same as now, and the > other ones did not. The only way to do that would be to have two > active development branches, one of which accepts only changes that > don't bump catversion or xlog_page_magic and the other of which > accepts changes of all sorts. Every patch that qualifies for the > no-pg-upgrade-required branch would have to be committed twice, > resolving conflicts as necessary. > > Also, over time, the number of supported branches would approximately > double. With a five year support window, it's currently about six. > If you had another set of semi-major releases in between the main set > of releases, you'd end up with 11 or 12 active branches, which would > make back-patching significantly more burdensome than currently. > > Now maybe you have some other idea in mind, but I don't quite > understand what it is. I do, indeed, and here it is: When considering whether to *back port* a change, we typically do so on the basis that bug fixes are back ported, features are not. In my proposed versioning scheme, you could back port any feature that is compatible with the old version, and bump the middle number to alert users that you've not just back ported a bug fix, but a feature. Any new features that are not compatible don't get back ported. If you fix a bug on master during development, you can back port that as well. But instead of bumping the middle number, you bump the last number. Somebody running a version that is three major versions back could still get the advantage of new features, so long as those new features are not incompatible. It's frequently not nearly so easy to run pg_upgrade as it is to run rpm -U. You get downtime either way, but the elapsed time of that downtime is orders of magnitude different. Someone else running that same version from three major versions ago can take a more conservative policy, and only upgrade bug fixes, and forego the back ported features. You still have one major version release per year. At that time, you can also release back-port versions of prior major versions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
Robert Haas writes: > On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane wrote: >> Personally, I'm +1 for such tinkering if it makes the feature either more >> controllable or more understandable. After reading the comments at the >> head of nodeGather.c, though, I don't think that single_copy is either >> understandable or useful, and merely renaming it won't help. Apparently, >> it runs code in the worker, except when it doesn't, and even when it does, >> it's absolutely guaranteed to be a performance loss because the leader is >> doing nothing. What in the world is the point? > The single_copy flag allows a Gather node to have a child plan which > is not intrinsically parallel. OK, but I'm very dubious of your claim that this has any use except for testing purposes. It certainly has no such use today. Therefore, the behavior of falling back to running in the leader seems like an anti-feature to me. If we want to test running in a worker, then we want to test that, not maybe test it. I propose that the behavior we actually want here is to execute in a worker, full stop. If we can't get one, wait till we can. If we can't get one because no slots have been allocated at all, fail. That would make the behavior deterministic enough for the regression tests to rely on it. And while I don't know what this mode should be called, I'm pretty sure that neither "single_copy" nor "pipeline" are useful descriptions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pager_min_lines tab completion
I have just noticed that when I added the pager_min_lines psetting to psql in 9.5, I omitted to add it to the relevant piece of tab-complete.c. If no-one objects I propose to remedy that shortly and backpatch it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New design for FK-based join selectivity estimation
Tomas Vondra writes: > On 06/18/2016 06:52 PM, Tom Lane wrote: >> I concur, actually, but I feel that it's out of scope for this >> particular patch, which is only trying to replace the functionality >> that was committed previously. If you want to come up with a patch on >> top of this that adds some accounting for NULLs, I'd be willing to >> consider it as a post-beta2 improvement. > Sure, fair enough. By post-beta2 you mean for 9.7, or still for 9.6? I think it'd be legitimate to address the NULLs question for 9.6, as long as the patch is not very large. If it does turn out to be invasive or otherwise hard to review, waiting for 9.7 might be more prudent. But I argued upthread that failing to consider nulls was a bug in the original patch, so dealing with them could be considered a bug fix. > If I could wish one more thing - could you briefly explain why you > rewrote the patch the way you did? I'd like to learn from this and while > I think I kinda understand most of the changes, I'm not really sure I > got it right. I don't at the moment recall everything I changed, but I'm happy to answer questions that are more specific than that one ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 12:26 PM, Mark Dilger wrote: >> In practical effect that is exactly what your proposal does. You just feel >> better because you defined when B is allowed to change even though it never >> should happen based upon our project policy. And any rare exception can >> justifiably be called a bug fix because, face it, it would only happen if >> someone reports a bug. > > Why are you refusing to acknowledge the difference between features that > require a pg_upgrade and features that don't? The amount of additional committer work that would be created by making that distinction would be large. Currently, we're on an annual release cycle. Every commit that's not a bug fix gets committed to exactly one branch: master. Inevitably, there are multiple changes per cycle - dozens, probably - that change initial catalog contents and would therefore require pg_upgrade. Suppose we switched to a semi-annual release cycle where every other release required a pg_upgrade, so once per year same as now, and the other ones did not. The only way to do that would be to have two active development branches, one of which accepts only changes that don't bump catversion or xlog_page_magic and the other of which accepts changes of all sorts. Every patch that qualifies for the no-pg-upgrade-required branch would have to be committed twice, resolving conflicts as necessary. Also, over time, the number of supported branches would approximately double. With a five year support window, it's currently about six. If you had another set of semi-major releases in between the main set of releases, you'd end up with 11 or 12 active branches, which would make back-patching significantly more burdensome than currently. Now maybe you have some other idea in mind, but I don't quite understand what it is. It's not likely to ever happen that we go through a whole 12 month release cycle and then get to the end of it and say "huh, I guess we never bumped catversion or xlog_page_magic". If that ever does happen, it's probably a sign that nobody is doing any meaningful feature development any more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New design for FK-based join selectivity estimation
Tomas Vondra writes: > OK, thanks. The one thing we haven't done is testing the performance, to > see how this fares. So I've repeated the tests I've done on the original > version of the patch here Hmm. I'm not that excited about these results, for a couple of reasons: * AFAICS, all the numbers are collected from the first execution of a query within a session, meaning caches aren't populated and everything has to be loaded from disk (or at least shared buffers). * I do not credit hundreds of completely redundant FKs between the same two tables as being representative of plausible real-world cases. I modified your new script as attached to get rid of the first problem. Comparing HEAD with HEAD minus commit 100340e2d, in non-assert builds, I get results like this for the 100-foreign-key case (with repeat count 1000 for the data collection script): select code, test, avg(time),stddev(time) from data group by 1,2 order by 1,2; code | test |avg | stddev +---++- head | t1/t2 | 0.065045045045045 | 0.00312962651081508 head | t3/t4 | 0.168561561561562 | 0.00379087132124092 head | t5/t6 | 0.127671671671672 | 0.00326275949269809 head | t7/t8 | 0.391057057057056 | 0.00590249325300915 revert | t1/t2 | 0.0613933933933937 | 0.0032082678131875 revert | t3/t4 | 0.0737507507507501 | 0.00221692725859567 revert | t5/t6 | 0.123759759759759 | 0.00431225386651805 revert | t7/t8 | 0.154082082082081 | 0.00405118420422266 (8 rows) So for the somewhat-credible cases, ie 100 unrelated foreign keys, I get about 3% - 6% slowdown. The 100-duplicate-foreign-keys case does indeed look like about a 2X slowdown, but as I said, I do not think that has anything to do with interesting usage. In any case, the situation I was worried about making better was queries joining many tables, which none of this exercises at all. regards, tom lane DB=$1 COUNT=$2 for i in `seq 1 $COUNT`; do echo "EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 USING (a);" done | psql $DB | grep 'Planning' | tail -n +2 | awk '{print "t1/t2\t"$3}' for i in `seq 1 $COUNT`; do echo "EXPLAIN ANALYZE SELECT * FROM t3 JOIN t4 USING (a);" done | psql $DB | grep 'Planning' | tail -n +2 | awk '{print "t3/t4\t"$3}' for i in `seq 1 $COUNT`; do echo "EXPLAIN ANALYZE SELECT * FROM t5 JOIN t6 USING (a,b,c,d);" done | psql $DB | grep 'Planning' | tail -n +2 | awk '{print "t5/t6\t"$3}' for i in `seq 1 $COUNT`; do echo "EXPLAIN ANALYZE SELECT * FROM t7 JOIN t8 USING (a,b,c,d);" done | psql $DB | grep 'Planning' | tail -n +2 | awk '{print "t7/t8\t"$3}' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Mon, Jun 20, 2016 at 12:28 PM, Mark Dilger wrote: > > > On Jun 20, 2016, at 8:53 AM, Mark Dilger > wrote: > > > > > > This is not a plea for keeping the three part versioning system. It's > just > > a plea not to have a 2 part versioning system masquerading as a three > > part versioning system, or vice versa. > > To clarify my concern, I never want to have to write code like this: > > CASE WHEN pg_version eq '11.1' OR pg_version eq '11.0.1' THEN foo() >WHEN pg_version eq '11.2' OR pg_version eq '11.0.2' > THEN bar() > > or > > if (0 == strcmp(pg_version_string, "11.1") || 0 == > strcmp(pg_version_string, "11.0.1")) > foo(); > else if (0 == strcmp(pg_version_string, "11.2") || 0 == > strcmp(pg_version_string, "11.0.2")) > bar(); > > either in sql, perl, c, java, or anywhere else. As soon as you have two > different > formats for the version string, you get into this hell. Yeah, ok, you may > have > a sql level function for this, but I'm thinking about applications > somewhat removed > from a direct connection to the database, where you can't be sure which > format > you'll be handed. > Now you argue for keeping the middle number on pure compatibility grounds... The correct format is: 110001 and 110002 Which pretty much boils down to "we're keeping the middle number but it will always be zero". So, I'll suppose you are giving a +1 to keeping the human-readable display 10.0.x - and will let other's interpret your reasons as they will. David J.
Re: [HACKERS] 10.0
> On Jun 20, 2016, at 8:53 AM, Mark Dilger wrote: > > > This is not a plea for keeping the three part versioning system. It's just > a plea not to have a 2 part versioning system masquerading as a three > part versioning system, or vice versa. To clarify my concern, I never want to have to write code like this: CASE WHEN pg_version eq '11.1' OR pg_version eq '11.0.1' THEN foo() WHEN pg_version eq '11.2' OR pg_version eq '11.0.2' THEN bar() or if (0 == strcmp(pg_version_string, "11.1") || 0 == strcmp(pg_version_string, "11.0.1")) foo(); else if (0 == strcmp(pg_version_string, "11.2") || 0 == strcmp(pg_version_string, "11.0.2")) bar(); either in sql, perl, c, java, or anywhere else. As soon as you have two different formats for the version string, you get into this hell. Yeah, ok, you may have a sql level function for this, but I'm thinking about applications somewhat removed from a direct connection to the database, where you can't be sure which format you'll be handed. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
> > In practical effect that is exactly what your proposal does. You just feel > better because you defined when B is allowed to change even though it never > should happen based upon our project policy. And any rare exception can > justifiably be called a bug fix because, face it, it would only happen if > someone reports a bug. Why are you refusing to acknowledge the difference between features that require a pg_upgrade and features that don't? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Mon, Jun 20, 2016 at 1:26 AM, Amit Kapila wrote: > I have done analysis on this and didn't found any use case where passing > CURSOR_OPT_PARALLEL_OK in exec_stmt_execsql() can help in parallelizing the > queries. Basically, there seems to be three ways in which function > exec_stmt_execsql can be used inside plpgsql. > > a. In execution of statement used in open cursor (via exec_stmt_open()) > b. In execution of statement when used in for loop (via exec_stmt_forc()) > c. In execution of SQL statement (via direct call to exec_stmt_execsql()) > > For the cases (a) and (b), one can construct a case where execution needs to > be suspended, so using parallel mode for those cases will not be meaningful. > For case (c), the Select statement seems to execute successfully only when > there is a *into* target, else it will give an error after execution as > below: > ERROR: query has no destination for result data > HINT: If you want to discard the results of a SELECT, use PERFORM instead. > > Now, if one has used into clause, the number of rows will be restricted in > which cases parallelism won't be enabled. OK, sounds like I got that right, then. Thanks for verifying. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Sun, Jun 19, 2016 at 10:23 PM, Tom Lane wrote: >> although I fear we >> might be getting to a level of tinkering with parallel query that >> starts to look more like feature development. > > Personally, I'm +1 for such tinkering if it makes the feature either more > controllable or more understandable. After reading the comments at the > head of nodeGather.c, though, I don't think that single_copy is either > understandable or useful, and merely renaming it won't help. Apparently, > it runs code in the worker, except when it doesn't, and even when it does, > it's absolutely guaranteed to be a performance loss because the leader is > doing nothing. What in the world is the point? The single_copy flag allows a Gather node to have a child plan which is not intrinsically parallel. For example, consider these two plans: Gather -> Parallel Seq Scan Gather -> Seq Scan The first plan is safe regardless of the setting of the single-copy flag. If the plan is executed in every worker, the results in aggregate across all workers will add up to the results of a non-parallel sequential scan of the table. The second plan is safe only if the # of workers is 1 and the single-copy flag is set. If either of those things is not true, then more than one process might try to execute the sequential scan, and the result will be that you'll get N copies of the output, where N = (# of parallel workers) + (leader also participates ? 1 : 0). For force_parallel_mode = {on, regress}, the single-copy behavior is essential. We can run all of those plans inside a worker, but only because we know that the leader won't also try to run those same plans. But it might be useful in other cases too. For example, imagine a plan like this: Join -> Join -> Join -> Join -> Gather (single copy) -> Join -> Join -> Join -> Join -> Scan (not parallel aware) This is pipeline parallelism. Instead of having one process do all of the joins, you can have a worker do some subset of them and the send the outputs back to the leader which can do the rest and return the results to the client. This is actually kind of hard to get right - according to the literature I've read on parallel query - because you can get pipeline stalls that erase most or all of the benefit, but it's a possible area to explore. Actually, though, the behavior I really want the single_copy flag to embody is not so much "only one process runs this" but "leader does not participate unless there are no workers", which is the same thing only when the budgeted number of workers is one. This is useful because of plans like this: Finalize HashAggregate -> Gather -> Partial HashAggregate -> Hash Join -> Parallel Seq Scan on large_table -> Hash -> Seq Scan on another_large_table Unless the # of groups is very small, the leader actually won't perform very much of the parallel-seq-scan on large_table, because it'll be too busy aggregating the results from the other workers. However, if it ever reaches a point where the Gather can't read a tuple from one of the workers immediately, which is almost certain to occur right at the beginning of execution, it's going to go build a copy of the hash table so that it can "help" with the hash join. By the time it finishes, the workers will have done the same and be feeding it results, and it will likely get little use out of the copy that it built itself. But it will still have gone to the effort of building it. For 10.0, Thomas Munro has already done a bunch of work, and will be doing more work, so that we can build a shared hash table, rather than one copy per worker. That's going to be better when the table is large anyway, so maybe this particular case won't matter so much. But in general when a partial path has a substantial startup cost, it may be better for the leader not to get involved. In a case like this, it's hard to see how the leader's involvement can ever hurt: Finalize HashAggregate -> Gather -> Partial HashAggregate -> Nested Loop -> Parallel Seq Scan on large_table -> Index Scan on some_other_table Even if the leader only processes only one or two pages of large_table, there's no real harm done unless, I suppose, the combine function is fabulously expensive, which seems unlikely. The lack of harm stems directly from the fact that there's no startup cost here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10.0
On Monday, June 20, 2016, Mark Dilger wrote: > > > On Jun 18, 2016, at 5:48 PM, Josh Berkus > wrote: > > > > On 06/16/2016 11:01 PM, Craig Ringer wrote: > >> > >> I thought about raising this, but I think in the end it's replacing one > >> confusing and weird versioning scheme for another confusing and weird > >> versioning scheme. > >> > >> It does have the advantage that that compare a two-part major like > >> 090401 vs 090402 won't be confused when they compare 100100 and 100200, > >> since it'll be 11 and 12. So it's more backward-compatible. But > >> ugly. > >> > > > > Realistically, though, we're more likely to end up with 10.0.1 than > > 10.1. I don't think we're anywhere near plumbing the depths of the > > stuff which will break because folks are parsing our version numbers > > with regexes. In more major software, this will break nagios > > check_postgres. > > Having a 3 part versioning scheme where the middle portion is always > zero makes the least sense to me of any of the proposals. If we're going > to have the pain and hurting of moving to a 2 part versioning scheme, > and breaking nagios and what not, then lets just get on with it. If we're > going to keep all three parts, can we please use my proposal earlier in > this thread where A.B.C are allocated for: > > C++: bug fixes only > B++: new features added, but still backward compatible with prior version > A++: new features added, not backward compatible, pg_upgrade required > > If every new feature release ends up requiring pg_upgrade, then B will > always be zero, which is no worse than your proposal. But at least users > can > refer to part B to learn something useful, namely whether they will need to > run pg_upgrade as part of upgrading their existing cluster. > > This has the advantage that new minor features, like adding a new guc, can > be released sooner than the next major release, but does not have to be > released in disguise as if it were a bug fix. > > This is not a plea for keeping the three part versioning system. It's just > a plea not to have a 2 part versioning system masquerading as a three > part versioning system, or vice versa. > > In practical effect that is exactly what your proposal does. You just feel better because you defined when B is allowed to change even though it never should happen based upon our project policy. And any rare exception can justifiably be called a bug fix because, face it, it would only happen if someone reports a bug. If we keep the middle digit it will be for compatibility reasons. Let's not cause people to infer something that isn't true by saying it could change. David J.
Re: [HACKERS] 10.0
> On Jun 18, 2016, at 5:48 PM, Josh Berkus wrote: > > On 06/16/2016 11:01 PM, Craig Ringer wrote: >> >> I thought about raising this, but I think in the end it's replacing one >> confusing and weird versioning scheme for another confusing and weird >> versioning scheme. >> >> It does have the advantage that that compare a two-part major like >> 090401 vs 090402 won't be confused when they compare 100100 and 100200, >> since it'll be 11 and 12. So it's more backward-compatible. But >> ugly. >> > > Realistically, though, we're more likely to end up with 10.0.1 than > 10.1. I don't think we're anywhere near plumbing the depths of the > stuff which will break because folks are parsing our version numbers > with regexes. In more major software, this will break nagios > check_postgres. Having a 3 part versioning scheme where the middle portion is always zero makes the least sense to me of any of the proposals. If we're going to have the pain and hurting of moving to a 2 part versioning scheme, and breaking nagios and what not, then lets just get on with it. If we're going to keep all three parts, can we please use my proposal earlier in this thread where A.B.C are allocated for: C++: bug fixes only B++: new features added, but still backward compatible with prior version A++: new features added, not backward compatible, pg_upgrade required If every new feature release ends up requiring pg_upgrade, then B will always be zero, which is no worse than your proposal. But at least users can refer to part B to learn something useful, namely whether they will need to run pg_upgrade as part of upgrading their existing cluster. This has the advantage that new minor features, like adding a new guc, can be released sooner than the next major release, but does not have to be released in disguise as if it were a bug fix. This is not a plea for keeping the three part versioning system. It's just a plea not to have a 2 part versioning system masquerading as a three part versioning system, or vice versa. mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
Teodor Sigaev writes: >> We're really quickly running out of time to get this done before >> beta2. Please don't commit anything that's going to break the tree >> because we only have about 72 hours before the wrap, but if it's >> correct then it should go in. > Isn't late now? Or wait to beta2 is out? Let's wait till after beta2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?
We're really quickly running out of time to get this done before beta2. Please don't commit anything that's going to break the tree because we only have about 72 hours before the wrap, but if it's correct then it should go in. Isn't late now? Or wait to beta2 is out? -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 13.06.2016 18:52, amul sul wrote: Hi, It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below: Ex.1: Two white spaces before HH24 whereas one before input time string postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS'); to_timestamp 2016-06-13 05:43:36-07 <— incorrect time (1 row) Ex.2: One whitespace before format string postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' /MM/DD HH24:MI:SS'); to_timestamp -- 0016-06-13 15:43:36-07:52:58 <— incorrect year (1 row) If there are one or more consecutive whitespace in the format, we should skip those as long as we could get an actual field. Thoughts? Thanks & Regards, Amul Sul From docs about to_timestamp() ( https://www.postgresql.org/docs/9.5/static/functions-formatting.html) "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results. For example, input to these functions is not restricted by normal ranges, thus to_date('20096040','MMDD') returns 2014-01-17 rather than causing an error. Casting does not have this behavior." And it wont stop on some simple whitespace. By using to_timestamp you can get any output results by providing illegal input parameters values: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On 20.06.2016 16:36, Tom Lane wrote: Robert Haas writes: On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas wrote: I think a space in the format string should skip a whitespace character in the input string, but not a non-whitespace character. It's my understanding that these functions exist in no small part for compatibility with Oracle, and Oracle declines to skip the digit '1' on the basis of an extra space in the format string, which IMHO is the behavior any reasonable user would expect. So Amul and I are of one opinion and Tom is of another. Anyone else have an opinion? I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? regards, tom lane Oracle: SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual; SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS') from dual * ERROR at line 1: ORA-01843: not a valid month PG: postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'MMDD HH24:MI:SS'); to_timestamp 2016-01-06 14:40:39+03 (1 row) I know about: "These functions interpret input liberally, with minimal error checking. While they produce valid output, the conversion can yield unexpected results" from docs but by providing illegal input parameters we have no any exceptions or errors about that. I think that to_timestamp() need to has more format checking than it has now. Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] primary_conninfo missing from pg_stat_wal_receiver
On 19/06/16 13:02, Michael Paquier wrote: > On Sun, Jun 19, 2016 at 7:38 PM, Vik Fearing wrote: >> On 19/06/16 12:28, Michael Paquier wrote: >>> On Sun, Jun 19, 2016 at 5:56 PM, Michael Paquier >>> Or in short the attached. >> >> The code looks good to me but why no documentation? > > Because that's a long day... Thanks! Now you have it. Thanks, this looks good. Could you please add it to the next commitfest so that it doesn't get lost and also so I can do an official review of it? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
Tom Lane wrote: > I don't necessarily have an opinion yet. I would like to see more than > just an unsupported assertion about what Oracle's behavior is. Also, > how should FM mode affect this? I can supply what Oracle 12.1 does: SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD HH24:MI:SS') AS ts FROM dual; TS 2016-06-13 15:43:36.0 AD (to_timestamp_tz behaves the same way.) So Oracle seems to make no difference between one or more spaces. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
Robert Haas writes: > On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas wrote: >> I think a space in the format string should skip a whitespace >> character in the input string, but not a non-whitespace character. >> It's my understanding that these functions exist in no small part for >> compatibility with Oracle, and Oracle declines to skip the digit '1' >> on the basis of an extra space in the format string, which IMHO is the >> behavior any reasonable user would expect. > So Amul and I are of one opinion and Tom is of another. Anyone else > have an opinion? I don't necessarily have an opinion yet. I would like to see more than just an unsupported assertion about what Oracle's behavior is. Also, how should FM mode affect this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Thu, Jun 16, 2016 at 8:20 AM, Robert Haas wrote: > > On Wed, Jun 15, 2016 at 10:48 PM, Amit Kapila wrote: > > exec_stmt_execsql() is used to execute SQL statements insider plpgsql which > > includes dml statements as well, so probably you wanted to play safe by not > > allowing parallel option from that place. However, I think there shouldn't > > be a problem in using CURSOR_OPT_PARALLEL_OK from this place as we have a > > check in standard_planner which will take care of whether to choose parallel > > mode or not for a particular statement. If you want, I can do more detailed > > analysis and prepare a patch. > > That would be great. I have a vague recollection that that function > might be called in some contexts where execution can be suspended, > which would be no good for parallel query. But that might be wrong. > I have done analysis on this and didn't found any use case where passing CURSOR_OPT_PARALLEL_OK in exec_stmt_execsql() can help in parallelizing the queries. Basically, there seems to be three ways in which function exec_stmt_execsql can be used inside plpgsql. a. In execution of statement used in open cursor (via exec_stmt_open()) b. In execution of statement when used in for loop (via exec_stmt_forc()) c. In execution of SQL statement (via direct call to exec_stmt_execsql()) For the cases (a) and (b), one can construct a case where execution needs to be suspended, so using parallel mode for those cases will not be meaningful. For case (c), the Select statement seems to execute successfully only when there is a *into* target, else it will give an error after execution as below: ERROR: query has no destination for result data HINT: If you want to discard the results of a SELECT, use PERFORM instead. Now, if one has used into clause, the number of rows will be restricted in which cases parallelism won't be enabled. Do, let me know if you see any case where generating a parallel path is meaningful via this path? I am of opinion, lets leave this code as is or may be we can add a comment why we have decided not to enable parallelism in this path. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Bug in to_timestamp().
On Mon, Jun 20, 2016 at 8:19 AM, Robert Haas wrote: > On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas > wrote: > > On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane wrote: > >> amul sul writes: > >>> It's look like bug in to_timestamp() function when format string has > more whitespaces compare to input string, see below: > >> > >> No, I think this is a case of "input doesn't match the format string". > >> > >> As a rule of thumb, using to_timestamp() for input that could be parsed > >> just fine by the standard timestamp input function is not a particularly > >> good idea. to_timestamp() is meant to deal with input that is in a > >> well-defined format that happens to not be parsable by timestamp_in. > >> This example doesn't meet either of those preconditions. > > > > I think a space in the format string should skip a whitespace > > character in the input string, but not a non-whitespace character. > > It's my understanding that these functions exist in no small part for > > compatibility with Oracle, and Oracle declines to skip the digit '1' > > on the basis of an extra space in the format string, which IMHO is the > > behavior any reasonable user would expect. > > So Amul and I are of one opinion and Tom is of another. Anyone else > have an opinion? > > At least Tom's position has the benefit of being consistent with current behavior. The current implementation doesn't actually care what literal value you specify - any non-special character consumes a single token from the input, period. SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD--HH24:MI:SS'); SELECT TO_TIMESTAMP('2016-06-13 15:43:36', '/MM/DD-HH24:MI:SS'); Both the above exhibit the same behavior as if you used a space instead of the hyphen as the group separator. The documentation should be updated to make this particular dynamic more clear. I don't see changing the general behavior of these "date formatting" functions a worthwhile endeavor. Adding a less-liberal "parse_timestamp" function I could get behind. IOW, the user seems to be happy with the fact that the "/" in the date matches his "-" but them complains that the space matches the number "1". You don't get to have it both ways. [re-reads the third usage note] Or maybe you do. We already define space as a being a greedy operator (when not used in conjunction with FX). A greedy space-space sequence makes little sense on its face and if we are going to be helpful here we should treat it as a single greedy space matcher. Note that "returns an error because to_timestamp expects one space only" is wrong - it errors because only a single space is captured and then the attempt to parse 'JUN' using "MON" fails. The following query doesn't fail though it exhibits the same space discrepancy (it just gives the same "wrong" result). SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'FX/MM/DD HH24:MI:SS'); Given that we already partially special-case the space expression I'd be inclined to consider Robert's and Amul's position on the matter. I think I'd redefine our treatment of space to be "zero or more" instead of "one or more" and require that it only match a literal space in the input. Having considered that, I'm not convinced its worth a compatibility break. I'd much rather deprecate these versions and write slightly-less-liberal versions named . In any case I'd called the present wording a bug. Instead: A single space consumes a single token of input and then, unless the FX modifier is present, consumes zero or more subsequent literal spaces. Thus, using two spaces in a row without the FX modifier, while allowed, is unlikely to give you a satisfactory result. The first space will consume all available consecutive spaces so that the second space will be guaranteed to consume a non-space token from the input. David J.
Re: [HACKERS] Bug in to_timestamp().
On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas wrote: > On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane wrote: >> amul sul writes: >>> It's look like bug in to_timestamp() function when format string has more >>> whitespaces compare to input string, see below: >> >> No, I think this is a case of "input doesn't match the format string". >> >> As a rule of thumb, using to_timestamp() for input that could be parsed >> just fine by the standard timestamp input function is not a particularly >> good idea. to_timestamp() is meant to deal with input that is in a >> well-defined format that happens to not be parsable by timestamp_in. >> This example doesn't meet either of those preconditions. > > I think a space in the format string should skip a whitespace > character in the input string, but not a non-whitespace character. > It's my understanding that these functions exist in no small part for > compatibility with Oracle, and Oracle declines to skip the digit '1' > on the basis of an extra space in the format string, which IMHO is the > behavior any reasonable user would expect. So Amul and I are of one opinion and Tom is of another. Anyone else have an opinion? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Truncating/vacuuming relations on full tablespaces
Thank you for useful suggestions. PFA patch, I have tried to cover all the points mentioned. Regards, Muhammad Asif Naeem On Wed, Apr 6, 2016 at 9:06 PM, Robert Haas wrote: > On Wed, Apr 6, 2016 at 3:32 AM, Asif Naeem wrote: > >> Oh, I see. I think it's probably not a good idea to skip truncating > >> those maps, but perhaps the option could be defined as making no new > >> entries, rather than ignoring them altogether, so that they never > >> grow. It seems that both of those are defined in such a way that if > >> page Y follows page X in the heap, the entries for page Y in the > >> relation fork will follow the one for page X. So truncating them > >> should be OK; it's just growing them that we need to avoid. > > > > Thank you Robert. PFA basic patch, it introduces EMERGENCY option to > VACUUM > > that forces to avoid extend any entries in the VM or FSM. It seems > working > > fine in simple test scenarios e.g. > > > >> postgres=# create table test1 as (select generate_series(1,10)); > >> SELECT 10 > >> postgres=# vacuum EMERGENCY test1; > >> VACUUM > >> postgres=# select pg_relation_filepath('test1'); > >> pg_relation_filepath > >> -- > >> base/13250/16384 > >> (1 row) > >> [asif@centos66 inst_96]$ find . | grep base/13250/16384 > >> ./data/base/13250/16384 > >> postgres=# vacuum test1; > >> VACUUM > >> [asif@centos66 inst_96]$ find . | grep base/13250/16384 > >> ./data/base/13250/16384 > >> ./data/base/13250/16384_fsm > >> ./data/base/13250/16384_vm > > > > > > Please do let me know if I missed something or more information is > required. > > Thanks. > > This is too late for 9.6 at this point and certainly requires > discussion anyway, so please add it to the next CommitFest. But in > the meantime, here are a few quick comments: > > You should only support EMERGENCY using the new-style, parenthesized > options syntax. That way, you won't need to make EMERGENCY a keyword. > We don't want to do that, and we especially don't want it to be > partially reserved, as you have done here. > > Passing the EMERGENCY flag around in a global variable is probably not > a good idea; use parameter lists. That's what they are for. Also, > calling the variable that decides whether EMERGENCY was set > Extend_VM_FSM is confusing. > > I see why you changed the calling convention for visibilitymap_pin() > and RecordPageWithFreeSpace(), but that's awfully invasive. I wonder > if there's a better way to do that, like maybe having vacuumlazy.c ask > the VM and FSM for their length in pages and then not trying to use > those functions for block numbers that are too large. > > Don't forget to update the documentation. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > VACUUM_EMERGENCY_Option_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype
On 20 June 2016 at 19:06, David Rowley wrote: > On 18 June 2016 at 05:45, Tom Lane wrote: >> A possible solution is to give deserialize an extra dummy argument, along >> the lines of "deserialize(bytea, internal) returns internal", thereby >> ensuring it can't be called in any non-system-originated contexts. This >> is still rather dangerous if the other argument is variable, as somebody >> might be able to abuse an internal-taking function by naming it as the >> deserialize function for a maliciously-designed aggregate. What I'm >> inclined to do to lock it down further is to drop the "serialtype" >> argument to CREATE AGGREGATE, which seems rather pointless (what else >> would you ever use besides bytea?). Instead, insist that >> serialize/deserialize apply *only* when the transtype is INTERNAL, and >> their signatures are exactly "serialize(internal) returns bytea" and >> "deserialize(bytea, internal) returns internal", never anything else. > > This is also the only way that I can think of to fix this issue. If we > can agree that the fix should be to insist that the deserialisation > function take an additional 2nd parameter of INTERNAL, then I can > write a patch to fix this, and include a patch for the document > section 35.10 to explain better about parallelising user defined > aggregates. I've gone and implemented the dummy argument approach for deserialization functions. If we go with this, I can then write the docs for 35.10 which'll serve to explain parallel user defined aggregates in detail. Some notes about the patch; I didn't remove the comments at the top of each deserial function which mention something like: * numeric_avg_serialize(numeric_avg_deserialize(bytea)) must result in a value * which matches the original bytea value. I'm thinking that perhaps these now make a little less sense, given that numeric_avg_deserialize is now numeric_avg_deserialize(bytea, internal). Perhaps these should be updated or removed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services deserialization_function_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should XLogInsert() be done only inside a critical section?
On 20/04/16 23:44, Tom Lane wrote: Over in <17456.1460832...@sss.pgh.pa.us> I speculated about whether we should be enforcing that WAL insertion happen only inside critical sections. We don't currently, and a survey of the backend says that there are quite a few calls that aren't inside critical sections. But there are at least two good reasons why we should, IMO: 1. It's not very clear that XLogInsert will recover cleanly if it's invoked outside a critical section and hits a failure. Certainly, if we allow such usage, then every potential error inside that code has to be analyzed under both critical-section and normal rules. It was certainly designed to recover from errors gracefully. XLogInsertRecord(), which does the low-level work of inserting the record to the WAL buffer, has a critical section of its own inside it. The code in xloginsert.c, for constructing the record to insert, operates on backend-private buffers only. 2. With no such check, it's quite easy for calling code to forget to create a critical section around code stanzas where one is *necessary* (because you're changing shared-buffer contents). Yeah, that is very true. Both of these points represent pretty clear hazards for introduction of future bugs, whether or not there are any such bugs today. As against this, it could be argued that adding critical sections where they're not absolutely necessary must make crashing failures more probable than they need to be. But first you'd have to prove that they're not absolutely necessary, which I'm unsure about because of point #1. One option would be to put the must-be-in-critical-section check in XLogRegisterBlock(). A WAL record that is associated with a shared buffer almost certainly needs a critical section, but many of the others are safe without it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cleanup in contrib/postgres_fdw/deparse.c
Hi, Here is a small patch to remove an unnecessary return from deparseFromExprForRel in contrib/postgres_fdw/deparse.c. Best regards, Etsuro Fujita pg-fdw-deparse-cleanup.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, I ran tpc-h's queries on this version and it's successful. The error is fixed. commit 705ad7f3b523acae0ddfdebd270b7048b2bb8029 Author: Tom Lane Date: Sun Jun 19 13:11:40 2016 -0400 Regards, Tatsuro Yamada NTT OSS Center On 2016/06/18 1:26, Robert Haas wrote: On Fri, Jun 17, 2016 at 9:29 AM, Robert Haas wrote: On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila wrote: Attached, please find updated patch based on above lines. Due to addition of projection path on top of partial paths, some small additional cost is added for parallel paths. In one of the tests in select_parallel.sql the plan was getting converted to serial plan from parallel plan due to this cost, so I have changed it to make it more restrictive. Before changing, I have debugged the test to confirm that it was due to this new projection path cost. I have added two new tests as well to cover the new code added by patch. I'm going to go work on this patch now. Here is a revised version, which I plan to commit in a few hours before the workday expires. I kept it basically as Amit had it, but I whacked the comments around some and, more substantively, avoided inserting and/or charging for a Result node when that's not necessary. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Push down join with PHVs (WAS: Re: [HACKERS] [sqlsmith] Failed assertion in postgres_fdw/deparse.c:1116)
On 2016/06/17 21:45, Robert Haas wrote: On Thu, Jun 16, 2016 at 10:44 PM, Etsuro Fujita wrote: On 2016/06/16 22:00, Robert Haas wrote: On Thu, Jun 16, 2016 at 7:05 AM, Etsuro Fujita wrote: ISTM that a robuster solution to this is to push down the ft1-ft2-ft3 join with the PHV by extending deparseExplicitTargetList() and/or something else so that we can send the remote query like: select ft1.a, (ft3.a IS NOT NULL) from (ft1 inner join ft2 on ft1.a = ft2.a) left join ft3 on ft1.a = ft3.a I completely agree we should have that, but not for 9.6. OK, I'll work on that for 10.0. Great, that will be a nice improvement. Here is a patch for that. * Modified deparseExplicitTargetList as well as is_foreign_expr and deparseExpr so that a join is pushed down with PHVs if not only the join but the PHVs are safe to execute remotely. The existing deparsing logic can't build a remote query that involves subqueries, so the patch currently gives up pushing down any upper joins involving a join (or foreign table!) with PHVs in the tlist. But I think the patch would be easily extended to support that, once we support deparsing subqueries. * Besides that, simplified foreign_join_ok a bit, by adding a new flag, allow_upper_foreign_join, to PgFdwRelationInfo, replacing the existing pushdown_safe flag with it. See the comments in postgres_fdw.h. Comments welcome. Best regards, Etsuro Fujita pg-fdw-phv-handling-v1.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallelized polymorphic aggs, and aggtype vs aggoutputtype
On 18 June 2016 at 09:29, Tom Lane wrote: > So at this point my proposal is: > > 1. Add an OID-list field to Aggref holding the data types of the > user-supplied arguments. This can be filled in parse analysis since it > won't change thereafter. Replace calls to get_aggregate_argtypes() with > use of the OID-list field, or maybe keep the function but have it look > at the OID list not the physical args list. > > 2. Add an OID field to Aggref holding the resolved (non polymorphic) > transition data type's OID. For the moment we could just set this > during parse analysis, since we do not support changing the transtype > of an existing aggregate. If we ever decide we want to allow that, > the lookup could be postponed into the planner. Replace calls to > resolve_aggregate_transtype with use of this field. > (resolve_aggregate_transtype() wouldn't disappear, but it would be > invoked only once during parse analysis, not multiple times per query.) > > #2 isn't necessary to fix the bug, but as long as we are doing #1 > we might as well do #2 too, to buy back some of the planner overhead > added by parallel query. > > Barring objections I'll make this happen by tomorrow. Thanks for committing this change. >From reading over the commit I see you've left; + * XXX need more documentation about partial aggregation here Would the attached cover off what you had imagined might go here? > I still don't like anything about the way that the matching logic in > fix_combine_agg_expr works, but at the moment I can't point to any > observable bug from that, so I'll not try to change it before beta2. Right, I see more work is needed in that area as there's now an out-of-date comment at the top of search_indexed_tlist_for_partial_aggref. The comment claims we only ignore aggoutputtype, but you added aggtranstype and aggargtypes to that list. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services aggref_comments.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers