Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2015-01-20 Thread Robert Haas
On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson tim.wil...@telogis.com wrote:
 Was slightly optimistic that this comment in the release notes might mean
 that my bug with bloat on hot tables might have been fixed in 9.4

 /Make VACUUM properly report dead but not-yet-removable rows to the
 statistics collector (Hari Babu)

 Previously these were reported as live rows./

 Unfortunately that is not the case, and we still have the problem in 9.4

As far as I can tell from reviewing the thread, what we need to do
here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.
There was a lot of concern about doing that in the back-branches, but
I'm skeptical that the concern is justified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2015-01-19 Thread tim_wilson
Was slightly optimistic that this comment in the release notes might mean
that my bug with bloat on hot tables might have been fixed in 9.4

/Make VACUUM properly report dead but not-yet-removable rows to the
statistics collector (Hari Babu)

Previously these were reported as live rows./

Unfortunately that is not the case, and we still have the problem in 9.4



--
View this message in context: 
http://postgresql.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5834687.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-09-10 Thread Bruce Momjian
On Thu, Jun 12, 2014 at 01:40:59PM +0200, Andres Freund wrote:
 Hi Tom,
 
 On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
  I figured it'd be easy enough to get a better estimate by adding another
  counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
  assuming that in-progress inserts and deletes will both commit).
 
 Did you plan to backpatch that? My inclination would be no...
 
   I did
  that, and found that it helped Tim's test case not at all :-(.  A bit of
  sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
  INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
  whether the transaction has since marked it for deletion:
  
  /*
   * It'd be possible to discern between INSERT/DELETE in progress
   * here by looking at xmax - but that doesn't seem beneficial 
  for
   * the majority of callers and even detrimental for some. We'd
   * rather have callers look at/wait for xmin than xmax. It's
   * always correct to return INSERT_IN_PROGRESS because that's
   * what's happening from the view of other backends.
   */
  return HEAPTUPLE_INSERT_IN_PROGRESS;
  
  It did not use to blow this question off: back around 8.3 you got
  DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
  less laziness + fuzzy thinking here.  Maybe we should have a separate
  HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
  the case that callers other than VACUUM itself are okay with failing
  to make this distinction?  I'm dubious: there are very few if any
  callers that treat the INSERT and DELETE cases exactly alike.
 
 My current position on this is that we should leave the code as is 9.4
 and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
 with that? The second best thing imo would be to discern and return
 HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
 respective cases.
 Which way would you like to go?

Did we ever come to a conclusion on this?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-25 Thread tim_wilson
Given that this seems to have slipped off the hackers radar (or in too hard
basket) I have constructed a horrible solution.

I will stop using autovacuum for this relation , I will use our own system
to monitor the relation, and I will reset pgclass.reltuples on this relation
after vacuum is done to the correct value.

I note that vacuum.c has comments in vac_update_relstat that changes to
pg_class are done without a transaction. Are there dangers of my doing an 
update pg_class set reltuples=6 where relkind='r' and
relname='my_hot_table' ? 







--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Inaccuracy-in-VACUUM-s-tuple-count-estimates-tp5806367p5809273.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-12 Thread Andres Freund
Hi Tom,

On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
 I figured it'd be easy enough to get a better estimate by adding another
 counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
 assuming that in-progress inserts and deletes will both commit).

Did you plan to backpatch that? My inclination would be no...

  I did
 that, and found that it helped Tim's test case not at all :-(.  A bit of
 sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
 INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
 whether the transaction has since marked it for deletion:
 
 /*
  * It'd be possible to discern between INSERT/DELETE in progress
  * here by looking at xmax - but that doesn't seem beneficial for
  * the majority of callers and even detrimental for some. We'd
  * rather have callers look at/wait for xmin than xmax. It's
  * always correct to return INSERT_IN_PROGRESS because that's
  * what's happening from the view of other backends.
  */
 return HEAPTUPLE_INSERT_IN_PROGRESS;
 
 It did not use to blow this question off: back around 8.3 you got
 DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
 less laziness + fuzzy thinking here.  Maybe we should have a separate
 HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
 the case that callers other than VACUUM itself are okay with failing
 to make this distinction?  I'm dubious: there are very few if any
 callers that treat the INSERT and DELETE cases exactly alike.

My current position on this is that we should leave the code as is 9.4
and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
with that? The second best thing imo would be to discern and return
HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
respective cases.
Which way would you like to go?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-11 Thread Andres Freund
On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
 
  I am not sure, given predicate.c's coding, how
  HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
  since that's the contentious point with Tom? Since 'both in
  progress'
  can only happen if xmin and xmax are the same toplevel xid and you
  resolve subxids to toplevel xids I think it should currently be safe
  either way?
 
 The only way that it could be a problem is if the DELETE is in a
 subtransaction which might get rolled back without rolling back the
 INSERT.

The way I understand the code in that case the subxid in xmax would have
been resolved the toplevel xid.

/*
 * Find top level xid.  Bail out if xid is too early to be a conflict, 
or
 * if it's our own xid.
 */
if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
return;
xid = SubTransGetTopmostTransaction(xid);
if (TransactionIdPrecedes(xid, TransactionXmin))
return;
if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
return;

That should essentially make that case harmless, right? So it seems the
optimization (and pessimization in other cases) of only tracking
toplevel xids seems to save the day here?

  If we ignore the conflict because we assume the INSERT
 will be negated by the DELETE, and that doesn't happen, we would
 get false negatives which would compromise correctness.  If we
 assume that the DELETE might not happen when the DELETE is not in a
 separate subtransaction we might get a false positive, which would
 only be a performance hit.  If we know either is possible and have
 a way to check in predicate.c, it's fine to check it there.

Given the above I don't think this currently can happen. Am I understand
it correctly? If so, it certainly deserves a comment...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-11 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-09 11:24:22 -0700, Kevin Grittner wrote:

 The only way that it could be a problem is if the DELETE is in a
 subtransaction which might get rolled back without rolling back the
 INSERT.

 The way I understand the code in that case the subxid in xmax would have
 been resolved the toplevel xid.

 /*
 * Find top level xid.  Bail out if xid is too early to be a conflict, or
 * if it's our own xid.
 */
 if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
 return;
 xid = SubTransGetTopmostTransaction(xid);
 if (TransactionIdPrecedes(xid, TransactionXmin))
 return;
 if (TransactionIdEquals(xid, GetTopTransactionIdIfAny()))
 return;

 That should essentially make that case harmless, right?

Hmm.  Since the switch statement above doesn't limit the
HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_INSERT_IN_PROGRESS cases
by visibility, we are safe.  I had been thinking that we had early
returns based on visibility, like the we do for HEAPTUPLE_LIVE and
HEAPTUPLE_RECENTLY_DEAD.  Since we don't do that, there is no
problem with the code either before or after your recent change. 
Apologies that I didn't notice that sooner.

It might be possible that with more guarantees of what those return
codes mean (possibly by adding the new one mentioned by Tom) we
could add that early return in one or both cases, which would
better optimize some corner cases involving subtransactions.  But
now doesn't seem like a good time to worry about that.  Such a
micro-optimization would not be a sane candidate for back-patching,
or for introducing this late in a release cycle.

So, nothing to see here, folks.  Move along.  predicate.c is
neutral in this matter.

Good spot, BTW!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Amit Kapila
On Sat, Jun 7, 2014 at 1:28 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
  I figured it'd be easy enough to get a better estimate by adding another
  counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus
effectively
  assuming that in-progress inserts and deletes will both commit).  I did
  that, and found that it helped Tim's test case not at all :-(.  A bit of
  sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
  INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless
of
  whether the transaction has since marked it for deletion:
 
  /*
   * It'd be possible to discern between INSERT/DELETE in
progress
   * here by looking at xmax - but that doesn't seem
beneficial for
   * the majority of callers and even detrimental for some.
We'd
   * rather have callers look at/wait for xmin than xmax. It's
   * always correct to return INSERT_IN_PROGRESS because
that's
   * what's happening from the view of other backends.
   */
  return HEAPTUPLE_INSERT_IN_PROGRESS;

 That's only the case of a couple of days ago. I really wasn't sure
 wheter to go that way or discern the two cases. That changed in the wake
 of:
 http://www.postgresql.org/message-id/20140530143150.GA11051@localhost

Won't this change impact the calculation of number of live
rows for analyze (acquire_sample_rows() considers the

HEAPTUPLE_DELETE_IN_PROGRESS tuples as liverows

for tuples updated by transactions other than current transaction)?


Even if we think that estimates are okay, the below comment

in acquire_same_rows() doesn't seem to suggest it.


/*
 * We count delete-in-progress rows as still live, using
 * the same reasoning given above; but we don't bother to
 * include them in the sample.
 *
..
*/


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Robert Haas
On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It did not use to blow this question off: back around 8.3 you got
 DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
 less laziness + fuzzy thinking here.  Maybe we should have a separate
 HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
 the case that callers other than VACUUM itself are okay with failing
 to make this distinction?

I think that would be a good idea for conceptual clarity if nothing
else.  If callers are OK with it, then they can treat both of those
codes alike; but then at least there's clear evidence as to the
author's intent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Andres Freund
On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
 On Fri, Jun 6, 2014 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  It did not use to blow this question off: back around 8.3 you got
  DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
  less laziness + fuzzy thinking here.  Maybe we should have a separate
  HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
  the case that callers other than VACUUM itself are okay with failing
  to make this distinction?
 
 I think that would be a good idea for conceptual clarity if nothing
 else.  If callers are OK with it, then they can treat both of those
 codes alike; but then at least there's clear evidence as to the
 author's intent.

I am happy to introduce the code for that. But it'd have to be =9.4 or
 9.4?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
 I think that would be a good idea for conceptual clarity if nothing
 else.  If callers are OK with it, then they can treat both of those
 codes alike; but then at least there's clear evidence as to the
 author's intent.

 I am happy to introduce the code for that. But it'd have to be =9.4 or
 9.4?

We need a solution that can be back-patched, unless you're prepared to
revert what you already did to HTSV in the back branches.

Having said that, it's not clear to me that we couldn't change HTSV's
API in the back branches.  What third-party code would be likely to
be depending on it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Andres Freund
On 2014-06-09 10:30:43 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-09 10:14:32 -0400, Robert Haas wrote:
  I think that would be a good idea for conceptual clarity if nothing
  else.  If callers are OK with it, then they can treat both of those
  codes alike; but then at least there's clear evidence as to the
  author's intent.
 
  I am happy to introduce the code for that. But it'd have to be =9.4 or
  9.4?
 
 We need a solution that can be back-patched, unless you're prepared to
 revert what you already did to HTSV in the back branches.

Well, I think reverting surely wouldn't be the right cure. It fixes a
somewhat nasty bug. I'd certainly be prepared to add the two lines
necessary to make it return DELETE_IN_PROGRESS after trying to
understand Kevin's email about predicate.c and going through the other
callers another time.
I did ask about what people think is the more appropriate return
value. And the only person that had voiced an opinion was Alvaro
agreeing that it's more correct to return INSERT_IN_PROGRESS... Don't
make this about me insisting on anything.

 Having said that, it's not clear to me that we couldn't change HTSV's
 API in the back branches.  What third-party code would be likely to
 be depending on it?

I'm not sure. I could imagine tools like pg_repack depending on it (it
doesn't). I started to search for external users of the function and
have only found a bunch of forks of postgres so far. Several of which
I've never heard of before.
I think it's more reasonable to return DELETE_IN_PROGRESS in existing
branches and then go the new return value in 9.5 or maybe 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:

 Well, I think reverting surely wouldn't be the right cure. It
 fixes a somewhat nasty bug. I'd certainly be prepared to add the
 two lines necessary to make it return DELETE_IN_PROGRESS after
 trying to understand Kevin's email about predicate.c and going
 through the other callers another time.

I'm not actually sure yet whether the current state of affairs
causes a problem for the serializable transaction isolation level
implementation.  The most important thing to me is that whatever is
implemented is accurately documented in code comments so I can make
any necessary adjustments to code in predicate.c -- or possibly
determine that I *do* need some change to HTSV.  Right now the HTSV
embedded code comments suggest that the enum names and comments
don't accurately describe the conditions under which they are
returned, but I can't find anything else which does, short of
reverse-engineering that from some fairly complex code.

Perhaps it would be good if you could provide a concise description
of the conditions under which value could currently be returned on
this (or the related) thread before we talk about what changes
might be needed?  Maybe this is clear to others involved in the
discussion, but I am not confident that I fully understand what
gets returned under what conditions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Andres Freund
On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote:
 I'm not actually sure yet whether the current state of affairs
 causes a problem for the serializable transaction isolation level
 implementation.

I'd replied in the other thread before noticing you'd replied
here... From what I understand right now it's not affected at all.

I tried to make things a bit clearer there - but I am not sure I've
succeed. I'm certainly willing to explain things further if you can tell
me which are is unclear.

 Right now the HTSV
 embedded code comments suggest that the enum names and comments
 don't accurately describe the conditions under which they are
 returned, but I can't find anything else which does, short of
 reverse-engineering that from some fairly complex code.

Not really. You could even argue the current code more correctly
represents the (old) comments:
HEAPTUPLE_INSERT_IN_PROGRESS,   /* inserting xact is still in progress 
*/
HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */
the current code will return INSERT_IN_PROGRESS even if the tuple has
*also* been deleted in another xact...
I think the problem here is that there's simply no way to really
represent that case accurately with the current API.

 Perhaps it would be good if you could provide a concise description
 of the conditions under which value could currently be returned on
 this (or the related) thread before we talk about what changes
 might be needed? Maybe this is clear to others involved in the
 discussion, but I am not confident that I fully understand what
 gets returned under what conditions.

HEAPTUPLE_DEAD, /* tuple is dead and deletable 
*/
1) xmin has committed, xmax has committed and wasn't just a locker. Xmax
precedes OldestXmin.
HEAPTUPLE_LIVE, /* tuple is live (committed, no 
deleter) */
1) xmin has committed, xmax unset
2) xmin has committed, xmax is locked only. Status of xmax is irrelevant
3) xmin has committed, xmax has aborted.
HEAPTUPLE_RECENTLY_DEAD,/* tuple is dead, but not deletable yet 
*/
1) xmin has committed, xmax has committed and wasn't only a locker. But
xmax doesn't precede OldestXmin.
HEAPTUPLE_INSERT_IN_PROGRESS,   /* inserting xact is still in 
progress */
new:
1) xmin is in progress, xmin is the current backend, xmax is invalid
2) xmin is in progress, xmin is the current backend, xmax only a locker
3) xmin is in progress, xmin is the current backend, xmax aborted
4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant
old:
1) xmin is in progress, xmax is invalid
2) xmin is in progress, xmax is only a locker
HEAPTUPLE_DELETE_IN_PROGRESS/* deleting xact is still in progress */
new:
1) xmin has committed, xmax is in progress, xmax is not just a locker
2) xmin is in progress, xmin is the current backend, xmax is not just a
   locker and in progress.
old:
1) xmin has committed, xmax is in progress, xmax is not just a locker
2) xmin is in progress, xmax is set and not not just a locker

Note that the 2) case here never checked xmax's status.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-09 08:00:52 -0700, Kevin Grittner wrote:

 I tried to make things a bit clearer there - but I am not sure I've
 succeed. I'm certainly willing to explain things further if you can tell
 me which are is unclear.

Thanks!  IMO, something like this should be included in the code
comments.

 HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in progress */
 HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
 the current code will return INSERT_IN_PROGRESS even if the tuple has
 *also* been deleted in another xact...
 I think the problem here is that there's simply no way to really
 represent that case accurately with the current API.

For purposes of predicate.c, if the also deleted activity might
be rolled back without rolling back the insert, INSERT_IN_PROGRESS
is the only correct value.  If they will either both commit or
neither will commit, predicate.c would be more efficient if
HEAPTUPLE_RECENTLY_DEAD was returned, but I
HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV.

 Perhaps it would be good if you could provide a concise description
 of the conditions under which value could currently be returned on
 this (or the related) thread before we talk about what changes
 might be needed? Maybe this is clear to others involved in the
 discussion, but I am not confident that I fully understand what
 gets returned under what conditions.

 HEAPTUPLE_DEAD,    /* tuple is dead and deletable */
 1) xmin has committed, xmax has committed and wasn't just a locker. Xmax
 precedes OldestXmin.

Perfect.

 HEAPTUPLE_LIVE,    /* tuple is live (committed, no deleter) */
 1) xmin has committed, xmax unset
 2) xmin has committed, xmax is locked only. Status of xmax is irrelevant
 3) xmin has committed, xmax has aborted.

Perfect.

 HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
 1) xmin has committed, xmax has committed and wasn't only a locker. But
 xmax doesn't precede OldestXmin.

For my purposes, it would be better if this also included:
 2) xmin is in progress, xmax matches (or includes) xmin

... but that would be only a performance tweak.

 HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in 
progress
 */
 new:
 1) xmin is in progress, xmin is the current backend, xmax is invalid
 2) xmin is in progress, xmin is the current backend, xmax only a locker
 3) xmin is in progress, xmin is the current backend, xmax aborted
 4) xmin is in progress, xmin is *not* current backend, xmax is irrelevant
 old:
 1) xmin is in progress, xmax is invalid
 2) xmin is in progress, xmax is only a locker

I think this is OK from a correctness PoV.  There may be an
opportunity to optimize.  I will look more closely at whether it
seems likely to matter much, and what sort of change in the return
conditions or in predicate.c might be needed.

 HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
 new:
 1) xmin has committed, xmax is in progress, xmax is not just a locker
 2) xmin is in progress, xmin is the current backend, xmax is not just a
   locker and in progress.

I'm not clear on how 2) could happen unless xmax is the current
backend or a subtransaction thereof.  Could you clarify?

 old:
 1) xmin has committed, xmax is in progress, xmax is not just a locker
 2) xmin is in progress, xmax is set and not not just a locker

 Note that the 2) case here never checked xmax's status.

Again, I'm not sure how 2) could happen unless they involve the
same top-level transaction.  What am I missing?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Andres Freund
On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  HEAPTUPLE_INSERT_IN_PROGRESS,    /* inserting xact is still in progress 
 */
  HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
  the current code will return INSERT_IN_PROGRESS even if the tuple has
  *also* been deleted in another xact...
  I think the problem here is that there's simply no way to really
  represent that case accurately with the current API.
 
 For purposes of predicate.c, if the also deleted activity might
 be rolled back without rolling back the insert, INSERT_IN_PROGRESS
 is the only correct value.  If they will either both commit or
 neither will commit, predicate.c would be more efficient if
 HEAPTUPLE_RECENTLY_DEAD was returned, but I
 HEAPTUPLE_INSERT_IN_PROGRESS would be OK from a correctness PoV.

That's basically the argument for the new behaviour.

But I am not sure, given predicate.c's coding, how
HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
since that's the contentious point with Tom? Since 'both in progress'
can only happen if xmin and xmax are the same toplevel xid and you
resolve subxids to toplevel xids I think it should currently be safe
either way?

  HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
  1) xmin has committed, xmax has committed and wasn't only a locker. But
  xmax doesn't precede OldestXmin.
 
 For my purposes, it would be better if this also included:
  2) xmin is in progress, xmax matches (or includes) xmin
 
 ... but that would be only a performance tweak.

I don't see that happening as there's several callers for which it is
important to know whether the xacts are still alive or not.

  HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
  new:
  1) xmin has committed, xmax is in progress, xmax is not just a locker
  2) xmin is in progress, xmin is the current backend, xmax is not just a
    locker and in progress.
 
 I'm not clear on how 2) could happen unless xmax is the current
 backend or a subtransaction thereof.  Could you clarify?
 
  old:
  1) xmin has committed, xmax is in progress, xmax is not just a locker
  2) xmin is in progress, xmax is set and not not just a locker
 
  Note that the 2) case here never checked xmax's status.
 
 Again, I'm not sure how 2) could happen unless they involve the
 same top-level transaction.  What am I missing?

Right, both can only happen if the tuple is created  deleted in the
same backend. Is that in contradiction to something you see?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:
 For purposes of predicate.c, if the also deleted activity might
 be rolled back without rolling back the insert, INSERT_IN_PROGRESS
 is the only correct value.

 That's basically the argument for the new behaviour.

 But I am not sure, given predicate.c's coding, how
 HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
 since that's the contentious point with Tom? Since 'both in progress'
 can only happen if xmin and xmax are the same toplevel xid and you
 resolve subxids to toplevel xids I think it should currently be safe
 either way?

Assuming that Kevin's describing his needs correctly, we could resolve
this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.  VACUUM could
assume that that's a probably-dead tuple, while SSI could do something
different.  I'm not sure if it's worth special-casing xmin == xmax,
where the tuple is guaranteed to never be of interest to any other
transaction?

The reason this stuff is not too carefully spec'd is that when HTSV
was written, there was no expectation that there was any correctness
issue around which of these cases was returned.  I wonder whether SSI
should be using HTSV at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:

 I am not sure, given predicate.c's coding, how
 HEAPTUPLE_DELETE_IN_PROGRESS could cause problems. Could you elaborate,
 since that's the contentious point with Tom? Since 'both in
 progress'
 can only happen if xmin and xmax are the same toplevel xid and you
 resolve subxids to toplevel xids I think it should currently be safe
 either way?

The only way that it could be a problem is if the DELETE is in a
subtransaction which might get rolled back without rolling back the
INSERT.  If we ignore the conflict because we assume the INSERT
will be negated by the DELETE, and that doesn't happen, we would
get false negatives which would compromise correctness.  If we
assume that the DELETE might not happen when the DELETE is not in a
separate subtransaction we might get a false positive, which would
only be a performance hit.  If we know either is possible and have
a way to check in predicate.c, it's fine to check it there.

 HEAPTUPLE_RECENTLY_DEAD,    /* tuple is dead, but not deletable yet */
 1) xmin has committed, xmax has committed and wasn't only a locker. But
 xmax doesn't precede OldestXmin.

  For my purposes, it would be better if this also included:
   2) xmin is in progress, xmax matches (or includes) xmin

  ... but that would be only a performance tweak.

 I don't see that happening as there's several callers for which it is
 important to know whether the xacts are still alive or not.

OK

 HEAPTUPLE_DELETE_IN_PROGRESS    /* deleting xact is still in progress */
 new:
 1) xmin has committed, xmax is in progress, xmax is not just a locker
 2) xmin is in progress, xmin is the current backend, xmax is not just a
   locker and in progress.

  I'm not clear on how 2) could happen unless xmax is the current
  backend or a subtransaction thereof.  Could you clarify?

 old:
 1) xmin has committed, xmax is in progress, xmax is not just a locker
 2) xmin is in progress, xmax is set and not not just a locker

 Note that the 2) case here never checked xmax's status.

  Again, I'm not sure how 2) could happen unless they involve the
  same top-level transaction.  What am I missing?

 Right, both can only happen if the tuple is created  deleted in the
 same backend. Is that in contradiction to something you see?

Well, you're making a big point that the status of xmax was not
checked in the old code.  If xmax is the same as xmin and xmin is
in progress, the additional check seems redundant -- unless I'm
missing something.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 The reason this stuff is not too carefully spec'd is that when
 HTSV was written, there was no expectation that there was any
 correctness issue around which of these cases was returned.  I
 wonder whether SSI should be using HTSV at all.

That's certainly a reasonable question.  When I was writing the SSI
code, I was blissfully unaware of HTSV and had coded up a way to
check this which seemed to work for all tests we had.  Jeff Davis,
reviewing the code, was concerned that such separate code was more
likely to miss something or to break as visibility handling
changed.  He argued that HTSV was basically checking for the same
things I was, and a redundant and version which did the check
differently was a bad idea.  Here is where it was discussed during
development:

http://www.postgresql.org/message-id/flat/1296499247.11513.777.camel@jdavis#1296499247.11513.777.camel@jdavis

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Alvaro Herrera
Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
  On 2014-06-09 09:45:12 -0700, Kevin Grittner wrote:

  old:
  1) xmin has committed, xmax is in progress, xmax is not just a locker
  2) xmin is in progress, xmax is set and not not just a locker
 
  Note that the 2) case here never checked xmax's status.
 
   Again, I'm not sure how 2) could happen unless they involve the
   same top-level transaction.  What am I missing?
 
  Right, both can only happen if the tuple is created  deleted in the
  same backend. Is that in contradiction to something you see?
 
 Well, you're making a big point that the status of xmax was not
 checked in the old code.  If xmax is the same as xmin and xmin is
 in progress, the additional check seems redundant -- unless I'm
 missing something.

IIRC the case that prompted the fix was a CREATE INDEX in the same
transaction that created a tuple which was later deleted by an aborted
subtransaction.  If the creating transaction is not this backend, that's
fine.  But if the creating transaction IsCurrentTransactionId() then we
need to be careful about aborted subxacts: if a tuple was deleted by an
aborted subxact then it's still visible to this transaction.  Xmax must
be checked in this case.  Note that the difference is pretty specific to
CREATE INDEX.  Vacuuming doesn't have to worry about such cases, mainly
because you can't run vacuum in a transaction.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-09 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 Assuming that Kevin's describing his needs correctly, we could resolve
 this by inventing HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.  VACUUM could
 assume that that's a probably-dead tuple, while SSI could do something
 different.

That could work.

On the other hand, the old code, where such a transaction showed as
HEAPTUPLE_DELETE_IN_PROGRESS, might still work for predicate.c as
long as it's clear that this return code sometimes means insert
and delete are both in progress and the insert might commit without
committing the delete.  Those conditions could be identified
within predicate.c; although it seems like that would involve
duplicating some work which was already in HTSV, with the usual
costs and risks of duplicate code.

 I'm not sure if it's worth special-casing xmin == xmax,
 where the tuple is guaranteed to never be of interest to any other
 transaction?

That could be checked in predicate.c.  I see no reason to create a
separate return code for it if it's not of interest for other
callers.  It could even be left as a later optimization, since a
false positive serialization failure doesn't compromise
correctness, but it seems like it is probably easy enough to cover
to just do so now.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-06 Thread Andres Freund
On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
 I figured it'd be easy enough to get a better estimate by adding another
 counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
 assuming that in-progress inserts and deletes will both commit).  I did
 that, and found that it helped Tim's test case not at all :-(.  A bit of
 sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
 INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
 whether the transaction has since marked it for deletion:
 
 /*
  * It'd be possible to discern between INSERT/DELETE in progress
  * here by looking at xmax - but that doesn't seem beneficial for
  * the majority of callers and even detrimental for some. We'd
  * rather have callers look at/wait for xmin than xmax. It's
  * always correct to return INSERT_IN_PROGRESS because that's
  * what's happening from the view of other backends.
  */
 return HEAPTUPLE_INSERT_IN_PROGRESS;

That's only the case of a couple of days ago. I really wasn't sure
wheter to go that way or discern the two cases. That changed in the wake
of:
http://www.postgresql.org/message-id/20140530143150.GA11051@localhost

I tried to solicit feedback (e.g. by CCing you :)) but I mostly
failed. Alvaro agreed, on IM, that it's better this way.

 It did not use to blow this question off: back around 8.3 you got
 DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
 less laziness + fuzzy thinking here.

My argument for not discerning wasn't that it's hard to do, but that it
might confuse callers more the other way round. E.g. doing a
XactLockTableWait(xmax) might not be sufficient for the tuple being
alive.


 Maybe we should have a separate
 HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?

Maybe.

 Is it *really*
 the case that callers other than VACUUM itself are okay with failing
 to make this distinction?  I'm dubious: there are very few if any
 callers that treat the INSERT and DELETE cases exactly alike.

I looked through all of them and saw none that'd be problematic. And
some, like predicate.c, where the new behaviour seems to be better. Most
of the ones that care about INSERT/DELETE_IN_PROGRESS wait on xmin/xmax
respectively.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers