Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson tim.wil...@telogis.com wrote: Was slightly optimistic that this comment in the release notes might mean that my bug with bloat on hot tables might have been fixed in 9.4 /Make VACUUM properly report dead but not-yet-removable rows to the statistics collector (Hari Babu) Previously these were reported as live rows./ Unfortunately that is not the case, and we still have the problem in 9.4 As far as I can tell from reviewing the thread, what we need to do here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. There was a lot of concern about doing that in the back-branches, but I'm skeptical that the concern is justified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Was slightly optimistic that this comment in the release notes might mean that my bug with bloat on hot tables might have been fixed in 9.4 /Make VACUUM properly report dead but not-yet-removable rows to the statistics collector (Hari Babu) Previously these were reported as live rows./ Unfortunately that is not the case, and we still have the problem in 9.4 -- View this message in context: http://postgresql.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5834687.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Thu, Jun 12, 2014 at 01:40:59PM +0200, Andres Freund wrote: Hi Tom, On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). Did you plan to backpatch that? My inclination would be no... I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I'm dubious: there are very few if any callers that treat the INSERT and DELETE cases exactly alike. My current position on this is that we should leave the code as is 9.4 and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok with that? The second best thing imo would be to discern and return HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the respective cases. Which way would you like to go? Did we ever come to a conclusion on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Given that this seems to have slipped off the hackers radar (or in too hard basket) I have constructed a horrible solution. I will stop using autovacuum for this relation , I will use our own system to monitor the relation, and I will reset pgclass.reltuples on this relation after vacuum is done to the correct value. I note that vacuum.c has comments in vac_update_relstat that changes to pg_class are done without a transaction. Are there dangers of my doing an update pg_class set reltuples=6 where relkind='r' and relname='my_hot_table' ? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5809273.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Hi Tom, On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). Did you plan to backpatch that? My inclination would be no... I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I'm dubious: there are very few if any callers that treat the INSERT and DELETE cases exactly alike. My current position on this is that we should leave the code as is 9.4 and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok with that? The second best thing imo would be to discern and return HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the respective cases. Which way would you like to go? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? The only way that it could be a problem is if the DELETE is in a subtransaction which might get rolled back without rolling back the INSERT. The way I understand the code in that case the subxid in xmax would have been resolved the toplevel xid. /* * Find top level xid. Bail out if xid is too early to be a conflict, or * if it's our own xid. */ if (TransactionIdEquals(xid, GetTopTransactionIdIfAny())) return; xid = SubTransGetTopmostTransaction(xid); if (TransactionIdPrecedes(xid, TransactionXmin)) return; if (TransactionIdEquals(xid, GetTopTransactionIdIfAny())) return; That should essentially make that case harmless, right? So it seems the optimization (and pessimization in other cases) of only tracking toplevel xids seems to save the day here? If we ignore the conflict because we assume the INSERT will be negated by the DELETE, and that doesn't happen, we would get false negatives which would compromise correctness. If we assume that the DELETE might not happen when the DELETE is not in a separate subtransaction we might get a false positive, which would only be a performance hit. If we know either is possible and have a way to check in predicate.c, it's fine to check it there. Given the above I don't think this currently can happen. Am I understand it correctly? If so, it certainly deserves a comment... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote: The only way that it could be a problem is if the DELETE is in a subtransaction which might get rolled back without rolling back the INSERT. The way I understand the code in that case the subxid in xmax would have been resolved the toplevel xid. /* * Find top level xid. Bail out if xid is too early to be a conflict, or * if it's our own xid. */ if (TransactionIdEquals(xid, GetTopTransactionIdIfAny())) return; xid = SubTransGetTopmostTransaction(xid); if (TransactionIdPrecedes(xid, TransactionXmin)) return; if (TransactionIdEquals(xid, GetTopTransactionIdIfAny())) return; That should essentially make that case harmless, right? Hmm. Since the switch statement above doesn't limit the HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_INSERT_IN_PROGRESS cases by visibility, we are safe. I had been thinking that we had early returns based on visibility, like the we do for HEAPTUPLE_LIVE and HEAPTUPLE_RECENTLY_DEAD. Since we don't do that, there is no problem with the code either before or after your recent change. Apologies that I didn't notice that sooner. It might be possible that with more guarantees of what those return codes mean (possibly by adding the new one mentioned by Tom) we could add that early return in one or both cases, which would better optimize some corner cases involving subtransactions. But now doesn't seem like a good time to worry about that. Such a micro-optimization would not be a sane candidate for back-patching, or for introducing this late in a release cycle. So, nothing to see here, folks. Move along. predicate.c is neutral in this matter. Good spot, BTW! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Sat, Jun 7, 2014 at 1:28 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; That's only the case of a couple of days ago. I really wasn't sure wheter to go that way or discern the two cases. That changed in the wake of: http://www.postgresql.org/message-id/20140530143150.GA11051@localhost Won't this change impact the calculation of number of live rows for analyze (acquire_sample_rows() considers the HEAPTUPLE_DELETE_IN_PROGRESS tuples as liverows for tuples updated by transactions other than current transaction)? Even if we think that estimates are okay, the below comment in acquire_same_rows() doesn't seem to suggest it. /* * We count delete-in-progress rows as still live, using * the same reasoning given above; but we don't bother to * include them in the sample. * .. */ With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 10:14:32 -0400, Robert Haas wrote: On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 10:14:32 -0400, Robert Haas wrote: I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? We need a solution that can be back-patched, unless you're prepared to revert what you already did to HTSV in the back branches. Having said that, it's not clear to me that we couldn't change HTSV's API in the back branches. What third-party code would be likely to be depending on it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 10:30:43 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 10:14:32 -0400, Robert Haas wrote: I think that would be a good idea for conceptual clarity if nothing else. If callers are OK with it, then they can treat both of those codes alike; but then at least there's clear evidence as to the author's intent. I am happy to introduce the code for that. But it'd have to be =9.4 or 9.4? We need a solution that can be back-patched, unless you're prepared to revert what you already did to HTSV in the back branches. Well, I think reverting surely wouldn't be the right cure. It fixes a somewhat nasty bug. I'd certainly be prepared to add the two lines necessary to make it return DELETE_IN_PROGRESS after trying to understand Kevin's email about predicate.c and going through the other callers another time. I did ask about what people think is the more appropriate return value. And the only person that had voiced an opinion was Alvaro agreeing that it's more correct to return INSERT_IN_PROGRESS... Don't make this about me insisting on anything. Having said that, it's not clear to me that we couldn't change HTSV's API in the back branches. What third-party code would be likely to be depending on it? I'm not sure. I could imagine tools like pg_repack depending on it (it doesn't). I started to search for external users of the function and have only found a bunch of forks of postgres so far. Several of which I've never heard of before. I think it's more reasonable to return DELETE_IN_PROGRESS in existing branches and then go the new return value in 9.5 or maybe 9.4. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: Well, I think reverting surely wouldn't be the right cure. It fixes a somewhat nasty bug. I'd certainly be prepared to add the two lines necessary to make it return DELETE_IN_PROGRESS after trying to understand Kevin's email about predicate.c and going through the other callers another time. I'm not actually sure yet whether the current state of affairs causes a problem for the serializable transaction isolation level implementation. The most important thing to me is that whatever is implemented is accurately documented in code comments so I can make any necessary adjustments to code in predicate.c -- or possibly determine that I *do* need some change to HTSV. Right now the HTSV embedded code comments suggest that the enum names and comments don't accurately describe the conditions under which they are returned, but I can't find anything else which does, short of reverse-engineering that from some fairly complex code. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote: I'm not actually sure yet whether the current state of affairs causes a problem for the serializable transaction isolation level implementation. I'd replied in the other thread before noticing you'd replied here... From what I understand right now it's not affected at all. I tried to make things a bit clearer there - but I am not sure I've succeed. I'm certainly willing to explain things further if you can tell me which are is unclear. Right now the HTSV embedded code comments suggest that the enum names and comments don't accurately describe the conditions under which they are returned, but I can't find anything else which does, short of reverse-engineering that from some fairly complex code. Not really. You could even argue the current code more correctly represents the (old) comments: HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. HEAPTUPLE_DEAD, /* tuple is dead and deletable */ 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax precedes OldestXmin. HEAPTUPLE_LIVE, /* tuple is live (committed, no deleter) */ 1) xmin has committed, xmax unset 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant 3) xmin has committed, xmax has aborted. HEAPTUPLE_RECENTLY_DEAD,/* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ new: 1) xmin is in progress, xmin is the current backend, xmax is invalid 2) xmin is in progress, xmin is the current backend, xmax only a locker 3) xmin is in progress, xmin is the current backend, xmax aborted 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant old: 1) xmin is in progress, xmax is invalid 2) xmin is in progress, xmax is only a locker HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote: I tried to make things a bit clearer there - but I am not sure I've succeed. I'm certainly willing to explain things further if you can tell me which are is unclear. Thanks! IMO, something like this should be included in the code comments. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. If they will either both commit or neither will commit, predicate.c would be more efficient if HEAPTUPLE_RECENTLY_DEAD was returned, but I HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV. Perhaps it would be good if you could provide a concise description of the conditions under which value could currently be returned on this (or the related) thread before we talk about what changes might be needed? Maybe this is clear to others involved in the discussion, but I am not confident that I fully understand what gets returned under what conditions. HEAPTUPLE_DEAD, /* tuple is dead and deletable */ 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax precedes OldestXmin. Perfect. HEAPTUPLE_LIVE, /* tuple is live (committed, no deleter) */ 1) xmin has committed, xmax unset 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant 3) xmin has committed, xmax has aborted. Perfect. HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ new: 1) xmin is in progress, xmin is the current backend, xmax is invalid 2) xmin is in progress, xmin is the current backend, xmax only a locker 3) xmin is in progress, xmin is the current backend, xmax aborted 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant old: 1) xmin is in progress, xmax is invalid 2) xmin is in progress, xmax is only a locker I think this is OK from a correctness PoV. There may be an opportunity to optimize. I will look more closely at whether it seems likely to matter much, and what sort of change in the return conditions or in predicate.c might be needed. HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: HEAPTUPLE_INSERT_IN_PROGRESS, /* inserting xact is still in progress */ HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ the current code will return INSERT_IN_PROGRESS even if the tuple has *also* been deleted in another xact... I think the problem here is that there's simply no way to really represent that case accurately with the current API. For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. If they will either both commit or neither will commit, predicate.c would be more efficient if HEAPTUPLE_RECENTLY_DEAD was returned, but I HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV. That's basically the argument for the new behaviour. But I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. I don't see that happening as there's several callers for which it is important to know whether the xacts are still alive or not. HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com writes: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: For purposes of predicate.c, if the also deleted activity might be rolled back without rolling back the insert, INSERT_IN_PROGRESS is the only correct value. That's basically the argument for the new behaviour. But I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? Assuming that Kevin's describing his needs correctly, we could resolve this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. VACUUM could assume that that's a probably-dead tuple, while SSI could do something different. I'm not sure if it's worth special-casing xmin == xmax, where the tuple is guaranteed to never be of interest to any other transaction? The reason this stuff is not too carefully spec'd is that when HTSV was written, there was no expectation that there was any correctness issue around which of these cases was returned. I wonder whether SSI should be using HTSV at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: I am not sure, given predicate.c's coding, how HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate, since that's the contentious point with Tom? Since 'both in progress' can only happen if xmin and xmax are the same toplevel xid and you resolve subxids to toplevel xids I think it should currently be safe either way? The only way that it could be a problem is if the DELETE is in a subtransaction which might get rolled back without rolling back the INSERT. If we ignore the conflict because we assume the INSERT will be negated by the DELETE, and that doesn't happen, we would get false negatives which would compromise correctness. If we assume that the DELETE might not happen when the DELETE is not in a separate subtransaction we might get a false positive, which would only be a performance hit. If we know either is possible and have a way to check in predicate.c, it's fine to check it there. HEAPTUPLE_RECENTLY_DEAD, /* tuple is dead, but not deletable yet */ 1) xmin has committed, xmax has committed and wasn't only a locker. But xmax doesn't precede OldestXmin. For my purposes, it would be better if this also included: 2) xmin is in progress, xmax matches (or includes) xmin ... but that would be only a performance tweak. I don't see that happening as there's several callers for which it is important to know whether the xacts are still alive or not. OK HEAPTUPLE_DELETE_IN_PROGRESS /* deleting xact is still in progress */ new: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmin is the current backend, xmax is not just a locker and in progress. I'm not clear on how 2) could happen unless xmax is the current backend or a subtransaction thereof. Could you clarify? old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Well, you're making a big point that the status of xmax was not checked in the old code. If xmax is the same as xmin and xmin is in progress, the additional check seems redundant -- unless I'm missing something. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Tom Lane t...@sss.pgh.pa.us wrote: The reason this stuff is not too carefully spec'd is that when HTSV was written, there was no expectation that there was any correctness issue around which of these cases was returned. I wonder whether SSI should be using HTSV at all. That's certainly a reasonable question. When I was writing the SSI code, I was blissfully unaware of HTSV and had coded up a way to check this which seemed to work for all tests we had. Jeff Davis, reviewing the code, was concerned that such separate code was more likely to miss something or to break as visibility handling changed. He argued that HTSV was basically checking for the same things I was, and a redundant and version which did the check differently was a bad idea. Here is where it was discussed during development: http://www.postgresql.org/message-id/flat/1296499247.11513.777.camel@jdavis#1296499247.11513.777.camel@jdavis -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote: old: 1) xmin has committed, xmax is in progress, xmax is not just a locker 2) xmin is in progress, xmax is set and not not just a locker Note that the 2) case here never checked xmax's status. Again, I'm not sure how 2) could happen unless they involve the same top-level transaction. What am I missing? Right, both can only happen if the tuple is created deleted in the same backend. Is that in contradiction to something you see? Well, you're making a big point that the status of xmax was not checked in the old code. If xmax is the same as xmin and xmin is in progress, the additional check seems redundant -- unless I'm missing something. IIRC the case that prompted the fix was a CREATE INDEX in the same transaction that created a tuple which was later deleted by an aborted subtransaction. If the creating transaction is not this backend, that's fine. But if the creating transaction IsCurrentTransactionId() then we need to be careful about aborted subxacts: if a tuple was deleted by an aborted subxact then it's still visible to this transaction. Xmax must be checked in this case. Note that the difference is pretty specific to CREATE INDEX. Vacuuming doesn't have to worry about such cases, mainly because you can't run vacuum in a transaction. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
Tom Lane t...@sss.pgh.pa.us wrote: Assuming that Kevin's describing his needs correctly, we could resolve this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS. VACUUM could assume that that's a probably-dead tuple, while SSI could do something different. That could work. On the other hand, the old code, where such a transaction showed as HEAPTUPLE_DELETE_IN_PROGRESS, might still work for predicate.c as long as it's clear that this return code sometimes means insert and delete are both in progress and the insert might commit without committing the delete. Those conditions could be identified within predicate.c; although it seems like that would involve duplicating some work which was already in HTSV, with the usual costs and risks of duplicate code. I'm not sure if it's worth special-casing xmin == xmax, where the tuple is guaranteed to never be of interest to any other transaction? That could be checked in predicate.c. I see no reason to create a separate return code for it if it's not of interest for other callers. It could even be left as a later optimization, since a false positive serialization failure doesn't compromise correctness, but it seems like it is probably easy enough to cover to just do so now. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates
On 2014-06-06 15:44:25 -0400, Tom Lane wrote: I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking at xmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESS because that's * what's happening from the view of other backends. */ return HEAPTUPLE_INSERT_IN_PROGRESS; That's only the case of a couple of days ago. I really wasn't sure wheter to go that way or discern the two cases. That changed in the wake of: http://www.postgresql.org/message-id/20140530143150.GA11051@localhost I tried to solicit feedback (e.g. by CCing you :)) but I mostly failed. Alvaro agreed, on IM, that it's better this way. It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. My argument for not discerning wasn't that it's hard to do, but that it might confuse callers more the other way round. E.g. doing a XactLockTableWait(xmax) might not be sufficient for the tuple being alive. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Maybe. Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I'm dubious: there are very few if any callers that treat the INSERT and DELETE cases exactly alike. I looked through all of them and saw none that'd be problematic. And some, like predicate.c, where the new behaviour seems to be better. Most of the ones that care about INSERT/DELETE_IN_PROGRESS wait on xmin/xmax respectively. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers