Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-02-03 Thread Noah Misch
On Wed, Jan 30, 2013 at 11:37:41PM +0530, Pavan Deolasee wrote:
 On Wed, Jan 30, 2013 at 7:34 AM, Noah Misch n...@leadboat.com wrote:
  On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
  On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch n...@leadboat.com wrote:
 
   You're the second commentator to be skittish about the patch's 
   correctness, so
   I won't argue against a conservatism-motivated bounce of the patch.
 
  Can you please rebase the patch against the latest head ? I see
  Alvaro's and Simon's recent changes has bit-rotten the patch.
 
  Attached.
 
 Thanks. Here are a few comments.
 
 1. I saw you took care of the bug that I reported in the first email
 by allowing overwriting a LP_DEAD itemid during recovery. While this
 should be OK given that we had never seen reports of recovery trying
 to accidentally overwrite a LP_DEAD itemid, are you sure after this
 change we can't get into this situation post reachedConsistency ? If
 we ever do, I think the recovery would just fail.

Having pondered this again, I conclude that checking reachedConsistency is
wrong.  As a simple example, during ongoing streaming replication, the startup
process will regularly be asked to overwrite LP_DEAD items.  Those items were
LP_UNUSED on the master, but only a full-page image would transfer that fact
to the standby before something overwrites the item.

That shows another flaw.  VACUUM marking a page all-visible is WAL-logged, so
the standby will receive that change.  But the standby may still have LP_DEAD
pointers on the affected page, where the master has LP_UNUSED.  I'm not aware
of any severe consequences; no scan type would be fooled.  Still, this
violates a current assumption of the system.

 2. In lazy_scan_heap() after detecting a DEAD tuple you now try to
 confirm that the tuple must not require a freeze. Is that really safe
 ? I think this assumes that the HOT prune would have already truncated
 *every* DEAD tuple to LP_DEAD except an INSERT_IN_PROGRESS which may
 have turned into DEAD between two calls to HeapTupleSatisfiesVacuum().
 While this might be true, but we never tried hard to prove that before
 because it wasn't necessary. I remember Heikki raising this concern
 when I proposed setting the VM bit after HOT prune. So a proof of that
 would probably help.

Yes, the patch assumes all that.  Since lazy_scan_heap() already refuses to
remove a heap-only or HOT-updated HEAPTUPLE_DEAD tuple, heap_page_prune() had
better not miss truncating those.  Otherwise, we would pass such a tuple to
heap_freeze_tuple(), which assumes the tuple is not HEAPTUPLE_DEAD.  The case
where heap_page_prune() could be leaving a HEAPTUPLE_DEAD tuple today without
such problems is a simple tuple not participating in HOT.  I think it's clear
from inspection that heap_page_prune() will visit all such tuples, and
heap_prune_chain() will truncate them when visited.

Since a tuple can move from HEAPTUPLE_INSERT_IN_PROGRESS to HEAPTUPLE_DEAD at
any instant here, heap_freeze_tuple() must not misbehave on such tuples.
Indeed, it does not; since the xmin was running at some point after we chose a
freeze horizon, the xmin will be above that horizon.  Therefore, the
try_freeze dance I added ought to be unnecessary.

 3. Converting LP_DEAD to LP_UNUSED in lazy_vacuum_page() while holding
 just a SHARE lock seems to be OK, but is a bit adventurous. I would
 rather just get an EX lock and do it.

Fair enough.

 Also, its probably more
 appropriate to just mark the buffer dirty instead of
 SetBufferCommitInfoNeedsSave().

There exist precedents for both styles.

 It may cause line pointer bloat and
 vacuum may not come to process this page again.

How would that happen?

 This will also be kind
 of unnecessary after the patch to set VM bit in the second phase gets
 in.

To the extent that pages tend to become all-visible after removing dead
material, yes.  When a long-running transaction is holding back the
all-visible horizon, it will still matter.

 4. Are changes in the variable names and logic around them in
 lazy_scan_heap() really required ? Just makes me a bit uncomfortable.
 See if we can minimize those changes or do it in a separate patch, if
 possible.

I assume you refer to the replacement of all_visible and has_dead_tuples
with has_nonlive and has_too_recent.  As ever, it would be hard to
honestly say that those changes were *required*.  They serve the side goal of
fixing the bookkeeping behind one of the warnings, which, independent of this
patch, isn't firing in as often as it should.  I could split that out as an
independent patch.

 I haven't run tests with the patch yet. Will see if I can try a few. I
 share other's views on making these changes late in the cycle, but if
 we can reduce the foot-print of the patch, we should be OK.

Given those reactions and my shortness of time to solidly fix everything noted
above, let's table this patch.  I'm marking it Returned with Feedback.

 I see the following (and 

Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-30 Thread Pavan Deolasee
On Wed, Jan 30, 2013 at 7:34 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
 On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch n...@leadboat.com wrote:

  You're the second commentator to be skittish about the patch's 
  correctness, so
  I won't argue against a conservatism-motivated bounce of the patch.

 Can you please rebase the patch against the latest head ? I see
 Alvaro's and Simon's recent changes has bit-rotten the patch.

 Attached.

Thanks. Here are a few comments.

1. I saw you took care of the bug that I reported in the first email
by allowing overwriting a LP_DEAD itemid during recovery. While this
should be OK given that we had never seen reports of recovery trying
to accidentally overwrite a LP_DEAD itemid, are you sure after this
change we can't get into this situation post reachedConsistency ? If
we ever do, I think the recovery would just fail.

2. In lazy_scan_heap() after detecting a DEAD tuple you now try to
confirm that the tuple must not require a freeze. Is that really safe
? I think this assumes that the HOT prune would have already truncated
*every* DEAD tuple to LP_DEAD except an INSERT_IN_PROGRESS which may
have turned into DEAD between two calls to HeapTupleSatisfiesVacuum().
While this might be true, but we never tried hard to prove that before
because it wasn't necessary. I remember Heikki raising this concern
when I proposed setting the VM bit after HOT prune. So a proof of that
would probably help.

3. Converting LP_DEAD to LP_UNUSED in lazy_vacuum_page() while holding
just a SHARE lock seems to be OK, but is a bit adventurous. I would
rather just get an EX lock and do it. Also, its probably more
appropriate to just mark the buffer dirty instead of
SetBufferCommitInfoNeedsSave(). It may cause line pointer bloat and
vacuum may not come to process this page again. This will also be kind
of unnecessary after the patch to set VM bit in the second phase gets
in.

4. Are changes in the variable names and logic around them in
lazy_scan_heap() really required ? Just makes me a bit uncomfortable.
See if we can minimize those changes or do it in a separate patch, if
possible.

I haven't run tests with the patch yet. Will see if I can try a few. I
share other's views on making these changes late in the cycle, but if
we can reduce the foot-print of the patch, we should be OK.

I see the following (and similar) messages while applying the patch,
but may be they are harmless.

(Stripping trailing CRs from patch.)

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-29 Thread Noah Misch
On Mon, Jan 28, 2013 at 07:24:04PM +0530, Pavan Deolasee wrote:
 On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch n...@leadboat.com wrote:
 
  You're the second commentator to be skittish about the patch's correctness, 
  so
  I won't argue against a conservatism-motivated bounce of the patch.
 
 Can you please rebase the patch against the latest head ? I see
 Alvaro's and Simon's recent changes has bit-rotten the patch.

Attached.
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 5553,5584  HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
  }
  
  /*
-  * Perform XLogInsert to register a heap cleanup info message. These
-  * messages are sent once per VACUUM and are required because
-  * of the phasing of removal operations during a lazy VACUUM.
-  * see comments for vacuum_log_cleanup_info().
-  */
- XLogRecPtr
- log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
- {
-   xl_heap_cleanup_info xlrec;
-   XLogRecPtr  recptr;
-   XLogRecData rdata;
- 
-   xlrec.node = rnode;
-   xlrec.latestRemovedXid = latestRemovedXid;
- 
-   rdata.data = (char *) xlrec;
-   rdata.len = SizeOfHeapCleanupInfo;
-   rdata.buffer = InvalidBuffer;
-   rdata.next = NULL;
- 
-   recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_CLEANUP_INFO, rdata);
- 
-   return recptr;
- }
- 
- /*
   * Perform XLogInsert for a heap-clean operation.  Caller must already
   * have modified the buffer and marked it dirty.
   *
--- 5553,5558 
***
*** 5930,5956  log_newpage_buffer(Buffer buffer)
  }
  
  /*
-  * Handles CLEANUP_INFO
-  */
- static void
- heap_xlog_cleanup_info(XLogRecPtr lsn, XLogRecord *record)
- {
-   xl_heap_cleanup_info *xlrec = (xl_heap_cleanup_info *) 
XLogRecGetData(record);
- 
-   if (InHotStandby)
-   ResolveRecoveryConflictWithSnapshot(xlrec-latestRemovedXid, 
xlrec-node);
- 
-   /*
-* Actual operation is a no-op. Record type exists to provide a means 
for
-* conflict processing to occur before we begin index vacuum actions. 
see
-* vacuumlazy.c and also comments in btvacuumpage()
-*/
- 
-   /* Backup blocks are not used in cleanup_info records */
-   Assert(!(record-xl_info  XLR_BKP_BLOCK_MASK));
- }
- 
- /*
   * Handles HEAP2_CLEAN record type
   */
  static void
--- 5904,5909 
***
*** 7057,7065  heap2_redo(XLogRecPtr lsn, XLogRecord *record)
case XLOG_HEAP2_CLEAN:
heap_xlog_clean(lsn, record);
break;
-   case XLOG_HEAP2_CLEANUP_INFO:
-   heap_xlog_cleanup_info(lsn, record);
-   break;
case XLOG_HEAP2_VISIBLE:
heap_xlog_visible(lsn, record);
break;
--- 7010,7015 
*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***
*** 121,133  heap_page_prune_opt(Relation relation, Buffer buffer, 
TransactionId OldestXmin)
 * have pruned while we hold pin.)
 */
if (PageIsFull(page) || PageGetHeapFreeSpace(page)  minfree)
-   {
-   TransactionId ignore = InvalidTransactionId;
/* return value not
-   
 * needed */
- 
/* OK to prune */
!   (void) heap_page_prune(relation, buffer, OldestXmin, 
true, ignore);
!   }
  
/* And release buffer lock */
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
--- 121,128 
 * have pruned while we hold pin.)
 */
if (PageIsFull(page) || PageGetHeapFreeSpace(page)  minfree)
/* OK to prune */
!   (void) heap_page_prune(relation, buffer, OldestXmin, 
true);
  
/* And release buffer lock */
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
***
*** 148,159  heap_page_prune_opt(Relation relation, Buffer buffer, 
TransactionId OldestXmin)
   * send its own new total to pgstats, and we don't want this delta applied
   * on top of that.)
   *
!  * Returns the number of tuples deleted from the page and sets
!  * latestRemovedXid.
   */
  int
  heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
!   bool report_stats, TransactionId 
*latestRemovedXid)
  {
int ndeleted = 0;
Pagepage = BufferGetPage(buffer);
--- 143,153 
   * send its own new total to pgstats, and we don't want this delta applied
   * on top of that.)
   *
!  * Returns the number of tuples deleted from the page.
   */
  int
  heap_page_prune(Relation relation, Buffer buffer, 

Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-28 Thread Pavan Deolasee
On Wed, Jan 23, 2013 at 10:05 AM, Noah Misch n...@leadboat.com wrote:

 You're the second commentator to be skittish about the patch's correctness, so
 I won't argue against a conservatism-motivated bounce of the patch.

Can you please rebase the patch against the latest head ? I see
Alvaro's and Simon's recent changes has bit-rotten the patch.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-28 Thread Simon Riggs
On 23 January 2013 04:35, Noah Misch n...@leadboat.com wrote:

 Also, perhaps we should
 consider Simon's one-liner fix for backpatching this instead of the
 original patch you posted?

 I have no nontrivial preference between the two approaches.

Sorry, I didn't see this. I guess you saw I applied my one liner and
backpatched it.

I'm expecting to apply Noah's larger patch to HEAD with the suggested
edits. I'll do that last thing in the CF.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-28 Thread Noah Misch
On Mon, Jan 28, 2013 at 02:12:33PM +, Simon Riggs wrote:
 On 23 January 2013 04:35, Noah Misch n...@leadboat.com wrote:
  Also, perhaps we should
  consider Simon's one-liner fix for backpatching this instead of the
  original patch you posted?
 
  I have no nontrivial preference between the two approaches.
 
 Sorry, I didn't see this. I guess you saw I applied my one liner and
 backpatched it.

Yes; thanks.

 I'm expecting to apply Noah's larger patch to HEAD with the suggested
 edits. I'll do that last thing in the CF.

What suggested edits do you have in mind?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-22 Thread Stephen Frost
Noah,

* Noah Misch (n...@leadboat.com) wrote:
 The attached update fixes both
 problems.  (I have also attached the unchanged backpatch-oriented fix to keep
 things together.)

I've just started looking at/playing with this patch and was wondering
if you'd missed Jeff's comments on it..?  I note that prev_dead_count is
still in this patch- any particular reason..?  Also, perhaps we should
consider Simon's one-liner fix for backpatching this instead of the
original patch you posted?

To be honest, I'm also a bit concerned about applying the patch you
provided for HEAD this late in the 9.3 cycle, especially if the plan is
to make further changes in this area to simplify things moving forward.
Perhaps we could do all of those changes early in the 9.4 cycle?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-22 Thread Noah Misch
On Sat, Jan 19, 2013 at 02:58:32PM -0800, Jeff Janes wrote:
 I have a preliminary nit-pick on the big patch.  It generates a compiler
 warning:
 
 vacuumlazy.c: In function ?lazy_scan_heap?:
 vacuumlazy.c:445:9: warning: variable ?prev_dead_count? set but not used
 [-Wunused-but-set-variable]

Thanks.  That variable can just go away.  Since a full review may turn up more
things, I'll hold off on sending a version fixing that alone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-22 Thread Noah Misch
On Tue, Jan 22, 2013 at 09:45:37PM -0500, Stephen Frost wrote:
 * Noah Misch (n...@leadboat.com) wrote:
  The attached update fixes both
  problems.  (I have also attached the unchanged backpatch-oriented fix to 
  keep
  things together.)
 
 I've just started looking at/playing with this patch and was wondering
 if you'd missed Jeff's comments on it..?  I note that prev_dead_count is
 still in this patch- any particular reason..?

Thanks for the reminder; I've now replied to Jeff.

 Also, perhaps we should
 consider Simon's one-liner fix for backpatching this instead of the
 original patch you posted?

I have no nontrivial preference between the two approaches.

 To be honest, I'm also a bit concerned about applying the patch you
 provided for HEAD this late in the 9.3 cycle, especially if the plan is
 to make further changes in this area to simplify things moving forward.
 Perhaps we could do all of those changes early in the 9.4 cycle?

Concerning further changes, I suppose you refer to Pavan's two designs noted
upthread?  I don't recommend going out of our way to consider all these
changes together; there are disadvantages, too, of making several VACUUM
performance changes in short succession.  I'm also not aware of concrete plans
to implement those designs.

You're the second commentator to be skittish about the patch's correctness, so
I won't argue against a conservatism-motivated bounce of the patch.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-19 Thread Jeff Janes
On Wednesday, January 9, 2013, Noah Misch wrote:

 On Thu, Jan 10, 2013 at 02:45:36AM +, Simon Riggs wrote:
  On 8 January 2013 02:49, Noah Misch n...@leadboat.com javascript:;
 wrote:
   There is a bug in lazy_scan_heap()'s
   bookkeeping for the xid to place in that WAL record.  Each call to
   heap_page_prune() simply overwrites vacrelstats-latestRemovedXid, but
   lazy_scan_heap() expects it to only ever increase the value.  I have a
   attached a minimal fix to be backpatched.  It has lazy_scan_heap()
 ignore
   heap_page_prune()'s actions for the purpose of this conflict xid,
 because
   heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.
 
  Interesting. Yes, bug, and my one of mine also.
 
  ISTM the right fix is fix to correctly initialize on pruneheap.c line 176
  prstate.latestRemovedXid = *latestRemovedXid;
  better to make it work than to just leave stuff hanging.

 That works, too.


As bug fixes don't usually go through the commit-fest process, will someone
be committing one of these two ideas for the back-branches?  And to HEAD,
in case the more invasive patch doesn't make it in?

I have a preliminary nit-pick on the big patch.  It generates a compiler
warning:

vacuumlazy.c: In function ‘lazy_scan_heap’:
vacuumlazy.c:445:9: warning: variable ‘prev_dead_count’ set but not used
[-Wunused-but-set-variable]


Thanks,

Jeff


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-09 Thread Noah Misch
Hi Pavan,

Thanks for reviewing.

On Tue, Jan 08, 2013 at 02:41:54PM +0530, Pavan Deolasee wrote:
 On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch n...@leadboat.com wrote:
  At that point in the investigation, I realized that the cost of being able
  to
  remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
  Again, the benefit is being able to remove tuples whose inserting
  transaction
  aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune()
  and
  the one in lazy_scan_heap().  To make that possible, lazy_vacuum_heap()
  grabs
  a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
  every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples.  If
  we
  take it out of the business of removing tuples, lazy_vacuum_heap() can skip
  WAL and update lp_flags under a mere shared lock.  The second attached
  patch,
  for HEAD, implements that.  Besides optimizing things somewhat, it
  simplifies
  the code and removes rarely-tested branches.  (This patch supersedes the
  backpatch-oriented patch rather than stacking with it.)
 
 
 +1. I'd also advocated removing the line pointer array scan in
 lazy_scan_heap() because the heap_page_prune() does most of the work. The
 reason why we still have that scan in lazy_scan_heap() is to check for new
 dead tuples (which should be rare), check for all-visibility of the page
 and freeze tuples if possible. I think we can move all of that to
 heap_page_prune().

Yes; that was an interesting thread.

 But while you are at it, I wonder if you have time and energy to look at
 the single pass vacuum work that I, Robert and others tried earlier but
 failed to get to the final stage [1][2]. If we do something like that, we
 might not need the second scan of the heap at all, which as you rightly
 pointed out, does not do much today anyway, other than marking LP_DEAD line
 pointers to LP_UNUSED. We can push that work to the next vacuum or even a
 foreground task.

I don't wish to morph this patch into a removal of lazy_vacuum_heap(), but
that will remain promising for the future.

 BTW, with respect to your idea of not WAL logging the operation of setting
 LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create
 problems with crash recovery. During normal vacuum operation, you may have
 converted LP_DEAD to LP_UNUSED. Then you actually used one of those line
 pointers for subsequent PageAddItem() on the page. If you crash after that
 but before the page gets written to the disk, during crash recovery, AFAICS
 PageAddItem() will complain with will not overwrite a used ItemId warning
 and subsequent PANIC of the recovery.

Good catch; I can reproduce that problem.  I've now taught PageAddItem() to
tolerate replacing an LP_DEAD item until recovery reaches consistency.  Also,
since lazy_vacuum_page() no longer calls PageRepairFragmentation(), it should
call PageSetHasFreeLinePointers() itself.  The attached update fixes both
problems.  (I have also attached the unchanged backpatch-oriented fix to keep
things together.)

nm
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 482,487  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 482,488 
boolall_visible;
boolhas_dead_tuples;
TransactionId visibility_cutoff_xid = InvalidTransactionId;
+   TransactionId unused;
  
if (blkno == next_not_all_visible_block)
{
***
*** 676,682  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 * We count tuples removed by the pruning step as removed by 
VACUUM.
 */
tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
!   
 vacrelstats-latestRemovedXid);
  
/*
 * Now scan the page to collect vacuumable items and check for 
tuples
--- 677,683 
 * We count tuples removed by the pruning step as removed by 
VACUUM.
 */
tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
!   
 unused);
  
/*
 * Now scan the page to collect vacuumable items and check for 
tuples
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 4262,4293  HeapTupleHeaderAdvanceLatestRemovedXid(HeapTupleHeader tuple,
  }
  
  /*
-  * Perform XLogInsert to register a heap cleanup info message. These
-  * messages are sent once per VACUUM and are required because
-  * of the phasing of removal operations during a lazy VACUUM.
-  * see comments for vacuum_log_cleanup_info().
-  */
- XLogRecPtr
- log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
- {
-   

Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-09 Thread Simon Riggs
On 8 January 2013 02:49, Noah Misch n...@leadboat.com wrote:

 There is a bug in lazy_scan_heap()'s
 bookkeeping for the xid to place in that WAL record.  Each call to
 heap_page_prune() simply overwrites vacrelstats-latestRemovedXid, but
 lazy_scan_heap() expects it to only ever increase the value.  I have a
 attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
 heap_page_prune()'s actions for the purpose of this conflict xid, because
 heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.

Interesting. Yes, bug, and my one of mine also.

ISTM the right fix is fix to correctly initialize on pruneheap.c line 176
prstate.latestRemovedXid = *latestRemovedXid;
better to make it work than to just leave stuff hanging.

I very much like your patch to remove all that cruft altogether; good
riddance. I think you're missing removing a few calls to
HeapTupleHeaderAdvanceLatestRemovedXid(), perhaps even that routine as
well.

Not sure about the skipping WAL records and share locking part, that's
too radical for me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-09 Thread Noah Misch
On Thu, Jan 10, 2013 at 02:45:36AM +, Simon Riggs wrote:
 On 8 January 2013 02:49, Noah Misch n...@leadboat.com wrote:
  There is a bug in lazy_scan_heap()'s
  bookkeeping for the xid to place in that WAL record.  Each call to
  heap_page_prune() simply overwrites vacrelstats-latestRemovedXid, but
  lazy_scan_heap() expects it to only ever increase the value.  I have a
  attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
  heap_page_prune()'s actions for the purpose of this conflict xid, because
  heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.
 
 Interesting. Yes, bug, and my one of mine also.
 
 ISTM the right fix is fix to correctly initialize on pruneheap.c line 176
 prstate.latestRemovedXid = *latestRemovedXid;
 better to make it work than to just leave stuff hanging.

That works, too.

 I very much like your patch to remove all that cruft altogether; good
 riddance. I think you're missing removing a few calls to
 HeapTupleHeaderAdvanceLatestRemovedXid(), perhaps even that routine as
 well.

Thanks.  Did you have a particular HeapTupleHeaderAdvanceLatestRemovedXid()
call site in mind?  The three calls remaining in the tree look right to me.

 Not sure about the skipping WAL records and share locking part, that's
 too radical for me.

Certainly a fair point of discussion.  In particular, using a plain exclusive
lock wouldn't be much worse.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-08 Thread Pavan Deolasee
On Tue, Jan 8, 2013 at 8:19 AM, Noah Misch n...@leadboat.com wrote:


 At that point in the investigation, I realized that the cost of being able
 to
 remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
 Again, the benefit is being able to remove tuples whose inserting
 transaction
 aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune()
 and
 the one in lazy_scan_heap().  To make that possible, lazy_vacuum_heap()
 grabs
 a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
 every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples.  If
 we
 take it out of the business of removing tuples, lazy_vacuum_heap() can skip
 WAL and update lp_flags under a mere shared lock.  The second attached
 patch,
 for HEAD, implements that.  Besides optimizing things somewhat, it
 simplifies
 the code and removes rarely-tested branches.  (This patch supersedes the
 backpatch-oriented patch rather than stacking with it.)


+1. I'd also advocated removing the line pointer array scan in
lazy_scan_heap() because the heap_page_prune() does most of the work. The
reason why we still have that scan in lazy_scan_heap() is to check for new
dead tuples (which should be rare), check for all-visibility of the page
and freeze tuples if possible. I think we can move all of that to
heap_page_prune().

But while you are at it, I wonder if you have time and energy to look at
the single pass vacuum work that I, Robert and others tried earlier but
failed to get to the final stage [1][2]. If we do something like that, we
might not need the second scan of the heap at all, which as you rightly
pointed out, does not do much today anyway, other than marking LP_DEAD line
pointers to LP_UNUSED. We can push that work to the next vacuum or even a
foreground task.

BTW, with respect to your idea of not WAL logging the operation of setting
LP_DEAD to LP_UNUSED in lazy_vacuum_page(), I wonder if this would create
problems with crash recovery. During normal vacuum operation, you may have
converted LP_DEAD to LP_UNUSED. Then you actually used one of those line
pointers for subsequent PageAddItem() on the page. If you crash after that
but before the page gets written to the disk, during crash recovery, AFAICS
PageAddItem() will complain with will not overwrite a used ItemId warning
and subsequent PANIC of the recovery.

Thanks,
Pavan

[1] http://archives.postgresql.org/pgsql-hackers/2011-07/msg00624.php
[2]
http://archives.postgresql.org/message-id/caboikdphax5ugugb9rjnsj+zveytv8sn4ctyfcmbc47r6_b...@mail.gmail.com


-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


[HACKERS] lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples

2013-01-07 Thread Noah Misch
Per this comment in lazy_scan_heap(), almost all tuple removal these days
happens in heap_page_prune():

case HEAPTUPLE_DEAD:
/*
 * Ordinarily, DEAD tuples would have 
been removed by
 * heap_page_prune(), but it's possible 
that the tuple
 * state changed since 
heap_page_prune() looked.  In
 * particular an INSERT_IN_PROGRESS 
tuple could have
 * changed to DEAD if the inserter 
aborted.  So this
 * cannot be considered an error 
condition.

vacuumlazy.c remains responsible for noticing the LP_DEAD line pointers left
by heap_page_prune(), removing corresponding index entries, and marking those
line pointers LP_UNUSED.

Nonetheless, lazy_vacuum_heap() retains the ability to remove actual
HEAPTUPLE_DEAD tuples and reclaim their LP_NORMAL line pointers.  This support
gets exercised only in the scenario described in the above comment.  For hot
standby, this capability requires its own WAL record, XLOG_HEAP2_CLEANUP_INFO,
to generate the necessary conflicts[1].  There is a bug in lazy_scan_heap()'s
bookkeeping for the xid to place in that WAL record.  Each call to
heap_page_prune() simply overwrites vacrelstats-latestRemovedXid, but
lazy_scan_heap() expects it to only ever increase the value.  I have a
attached a minimal fix to be backpatched.  It has lazy_scan_heap() ignore
heap_page_prune()'s actions for the purpose of this conflict xid, because
heap_page_prune() emitted an XLOG_HEAP2_CLEAN record covering them.

At that point in the investigation, I realized that the cost of being able to
remove entire tuples in lazy_vacuum_heap() greatly exceeds the benefit.
Again, the benefit is being able to remove tuples whose inserting transaction
aborted between the HeapTupleSatisfiesVacuum() call in heap_page_prune() and
the one in lazy_scan_heap().  To make that possible, lazy_vacuum_heap() grabs
a cleanup lock, calls PageRepairFragmentation(), and emits a WAL record for
every page containing LP_DEAD line pointers or HEAPTUPLE_DEAD tuples.  If we
take it out of the business of removing tuples, lazy_vacuum_heap() can skip
WAL and update lp_flags under a mere shared lock.  The second attached patch,
for HEAD, implements that.  Besides optimizing things somewhat, it simplifies
the code and removes rarely-tested branches.  (This patch supersedes the
backpatch-oriented patch rather than stacking with it.)

The bookkeeping behind the page containing dead tuples is marked as
all-visible in relation warning is also faulty; it only fires when
lazy_heap_scan() saw the HEAPTUPLE_DEAD tuple; again, heap_page_prune() will
be the one to see it in almost every case.  I changed the warning to fire
whenever the table cannot be marked all-visible for a reason other than the
presence of too-recent live tuples.

I considered renaming lazy_vacuum_heap() to lazy_heap_clear_dead_items(),
reflecting its narrower role.  Ultimately, I left function names unchanged.

This patch conflicts textually with Pavan's Setting visibility map in
VACUUM's second phase patch, but I don't see any conceptual incompatibility.

I can't give a simple statement of the performance improvement here.  The
XLOG_HEAP2_CLEAN record is fairly compact, so the primary benefit of avoiding
it is the possibility of avoiding a full-page image.  For example, if a
checkpoint lands just before the VACUUM and again during the index-cleaning
phase (assume just one such phase in this example), this patch reduces
heap-related WAL volume by almost 50%.  Improvements as extreme as 2% and 97%
are possible given other timings of checkpoints relatively to the VACUUM.  In
general, expect this to help VACUUMs spanning several checkpoint cycles more
than it helps shorter VACUUMs.  I have attached a script I used as a reference
workload for testing different checkpoint timings.  There should also be some
improvement from keeping off WALInsertLock, not requiring WAL flushes to evict
from the ring buffer during the lazy_vacuum_heap() phase, and not taking a
second buffer cleanup lock.  I did not attempt to quantify those.

Thanks,
nm

[1] Normally, heap_page_prune() removes the tuple first (leaving an LP_DEAD
line pointer), and vacuumlazy.c removes index entries afterward.  When the
removal happens in this order, the XLOG_HEAP2_CLEAN record takes care of
conflicts.  However, in the rarely-used code path, we remove the index entries
before removing the tuple.  XLOG_HEAP2_CLEANUP_INFO conflicts with standby
snapshots that might need the vanishing index entries.
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 482,487  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 482,488 
bool