Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-31 Thread Decibel!

On Jul 30, 2007, at 8:00 PM, Alvaro Herrera wrote:

ITAGAKI Takahiro wrote:

Alvaro Herrera [EMAIL PROTECTED] wrote:
I think we might need additional freezing-xmax operations to  
avoid
XID-wraparound in the first path of vacuum, though it hardly  
occurs.


I'm not sure I follow.  Can you elaborate?  Do you mean storing a
separate relfrozenxmax for each table or something like that?


We need to work around wraparound of xmax in dead tuples. If we  
miss to

vacuum them and XID is wrapped, we cannot remove them until the next
XID-wraparound, because we treat them to be deleted in the *future*.


Oh, but this should not be a problem, because a tuple is either frozen
or removed completely -- xmax cannot precede xmin.


What if it's frozen, then deleted, and then we wrap on xmax? Wouldn't  
that make the tuple re-appear?

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-31 Thread Alvaro Herrera
Decibel! wrote:
 On Jul 30, 2007, at 8:00 PM, Alvaro Herrera wrote:
 ITAGAKI Takahiro wrote:
 Alvaro Herrera [EMAIL PROTECTED] wrote:
 I think we might need additional freezing-xmax operations to avoid
 XID-wraparound in the first path of vacuum, though it hardly occurs.

 I'm not sure I follow.  Can you elaborate?  Do you mean storing a
 separate relfrozenxmax for each table or something like that?

 We need to work around wraparound of xmax in dead tuples. If we miss to
 vacuum them and XID is wrapped, we cannot remove them until the next
 XID-wraparound, because we treat them to be deleted in the *future*.

 Oh, but this should not be a problem, because a tuple is either frozen
 or removed completely -- xmax cannot precede xmin.

 What if it's frozen, then deleted, and then we wrap on xmax? Wouldn't that 
 make the tuple re-appear?

That cannot happen, because the next vacuum will remove the tuple if the
Xmax is committed.  If the deleting transaction aborts, then vacuum will
set Xmax to Invalid (see heap_freeze_tuple in heapam.c).

One potential problem you would see is if the deleting transaction marks
it deleted and then not commit for 2 billion transactions, thus vacuum
is not able to remove it because it shows up as delete-in-progress.
However there are plenty other problems you would hit in that case
(autovacuum starting to misbehave being the first you would probably
notice).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-31 Thread Decibel!

On Jul 30, 2007, at 1:47 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote:

ITAGAKI Takahiro wrote:
It would be cool if we could do something like sweep a range of  
pages,
initiate IO for those that are not in shared buffers, and while  
that is
running, lock and clean up the ones that are in shared buffers,  
skipping

those that are not lockable right away; when that's done, go back to
those buffers that were gotten from I/O and clean those up.  And  
retry


Would that be substantially easier than just creating a bgreader?


I'm not sure about easier, but I'm not sure that the bgreader can  
do the

same job.  ISTM that the bgreader would be mostly in charge of reading
in advance of backends, whereas what I'm proposing is mostly about
finding the best spot for locking.  It might turn out to be more  
trouble
than it's worth though, for sure.  And in any case I'm not in a  
hurry to

implement it.


I was referring specifically to the read in what's not already in  
shared buffers part of Itagaki-san's message... that seems to be  
something best suited for a bgreader.

--
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread Alvaro Herrera
Jim Nasby wrote:
 On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote:
 ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:

 Read the heap blocks in sequence, but make a conditional lock for
 cleanup on each block. If we don't get it, sleep, then try again when we
 wake up. If we fail the second time, just skip the block completely.

 It would be cool if we could do something like sweep a range of pages,
 initiate IO for those that are not in shared buffers, and while that is
 running, lock and clean up the ones that are in shared buffers, skipping
 those that are not lockable right away; when that's done, go back to
 those buffers that were gotten from I/O and clean those up.  And retry
 the locking for those that couldn't be locked the first time around,
 also conditionally.  And when that's all done, a third pass could get
 those blocks that weren't cleaned up in none of the previous passes (and
 this time the lock would not be conditional).

 Would that be substantially easier than just creating a bgreader?

I'm not sure about easier, but I'm not sure that the bgreader can do the
same job.  ISTM that the bgreader would be mostly in charge of reading
in advance of backends, whereas what I'm proposing is mostly about
finding the best spot for locking.  It might turn out to be more trouble
than it's worth though, for sure.  And in any case I'm not in a hurry to
implement it.

In any case I'm not so sure about skipping vacuuming a block if it's not
lockable.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Crear es tan difĂ­cil como ser libre (Elsa Triolet)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread ITAGAKI Takahiro

Alvaro Herrera [EMAIL PROTECTED] wrote:

  I think we might need additional freezing-xmax operations to avoid
  XID-wraparound in the first path of vacuum, though it hardly occurs.
 
 I'm not sure I follow.  Can you elaborate?  Do you mean storing a
 separate relfrozenxmax for each table or something like that?

We need to work around wraparound of xmax in dead tuples. If we miss to
vacuum them and XID is wrapped, we cannot remove them until the next
XID-wraparound, because we treat them to be deleted in the *future*.


  We just add XID of the vacuum to dead tuples we see in the
  first path. When backends find a dead tuple and see the transaction
  identified by XID in it has commited, they can freely reuse the area of
  the dead tuple because we can assume index entries pointing the tuple
  have been removed by the vacuum.
 
 I would be worried about leftover index entries being later used by new
 tuples in the heap.  Then when you visit the index, find that entry, go
 to the heap and find the new tuple and return it, which could be bogus.

Avoiding dangling index entries, I'm thinking about reusing dead tuples
only if we see the VACUUM transaction have committed successfully.
That means the VACUUM transaction removed all index entries corresponding
those dead tuples; They are now Heap-Only-Tuples, so that we can recycle
them in the same manner as HOT updated tuples.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-30 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 
 Alvaro Herrera [EMAIL PROTECTED] wrote:
 
   I think we might need additional freezing-xmax operations to avoid
   XID-wraparound in the first path of vacuum, though it hardly occurs.
  
  I'm not sure I follow.  Can you elaborate?  Do you mean storing a
  separate relfrozenxmax for each table or something like that?
 
 We need to work around wraparound of xmax in dead tuples. If we miss to
 vacuum them and XID is wrapped, we cannot remove them until the next
 XID-wraparound, because we treat them to be deleted in the *future*.

Oh, but this should not be a problem, because a tuple is either frozen
or removed completely -- xmax cannot precede xmin.


   We just add XID of the vacuum to dead tuples we see in the
   first path. When backends find a dead tuple and see the transaction
   identified by XID in it has commited, they can freely reuse the area of
   the dead tuple because we can assume index entries pointing the tuple
   have been removed by the vacuum.
  
  I would be worried about leftover index entries being later used by new
  tuples in the heap.  Then when you visit the index, find that entry, go
  to the heap and find the new tuple and return it, which could be bogus.
 
 Avoiding dangling index entries, I'm thinking about reusing dead tuples
 only if we see the VACUUM transaction have committed successfully.
 That means the VACUUM transaction removed all index entries corresponding
 those dead tuples; They are now Heap-Only-Tuples, so that we can recycle
 them in the same manner as HOT updated tuples.

Hmm.  OK, I admit I have no idea how HOT works.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-28 Thread Jim Nasby

On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote:

ITAGAKI Takahiro wrote:

Simon Riggs [EMAIL PROTECTED] wrote:


Read the heap blocks in sequence, but make a conditional lock for
cleanup on each block. If we don't get it, sleep, then try again  
when we

wake up. If we fail the second time, just skip the block completely.


It would be cool if we could do something like sweep a range of pages,
initiate IO for those that are not in shared buffers, and while  
that is
running, lock and clean up the ones that are in shared buffers,  
skipping

those that are not lockable right away; when that's done, go back to
those buffers that were gotten from I/O and clean those up.  And retry
the locking for those that couldn't be locked the first time around,
also conditionally.  And when that's all done, a third pass could get
those blocks that weren't cleaned up in none of the previous passes  
(and

this time the lock would not be conditional).


Would that be substantially easier than just creating a bgreader?
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-27 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 Simon Riggs [EMAIL PROTECTED] wrote:
 
  Read the heap blocks in sequence, but make a conditional lock for
  cleanup on each block. If we don't get it, sleep, then try again when we
  wake up. If we fail the second time, just skip the block completely.

It would be cool if we could do something like sweep a range of pages,
initiate IO for those that are not in shared buffers, and while that is
running, lock and clean up the ones that are in shared buffers, skipping
those that are not lockable right away; when that's done, go back to
those buffers that were gotten from I/O and clean those up.  And retry
the locking for those that couldn't be locked the first time around,
also conditionally.  And when that's all done, a third pass could get
those blocks that weren't cleaned up in none of the previous passes (and
this time the lock would not be conditional).

Then do a vacuum_delay sleep.

 When we allow some skips in removing dead tuples, can we guarantee
 pg_class.relfrozenxid?

No we can't.

 I think we might need additional freezing-xmax operations to avoid
 XID-wraparound in the first path of vacuum, though it hardly occurs.

I'm not sure I follow.  Can you elaborate?  Do you mean storing a
separate relfrozenxmax for each table or something like that?

 It might be a future topic ... if we are in the direciton of 
 optimistic sweeping, is it possible to remove the second path of vacuum
 completely? We just add XID of the vacuum to dead tuples we see in the
 first path. When backends find a dead tuple and see the transaction
 identified by XID in it has commited, they can freely reuse the area of
 the dead tuple because we can assume index entries pointing the tuple
 have been removed by the vacuum.

I would be worried about leftover index entries being later used by new
tuples in the heap.  Then when you visit the index, find that entry, go
to the heap and find the new tuple and return it, which could be bogus.
(Unless, I think, you check in the index when you are going to insert
the new index tuple -- if the CTID is already used, reuse that entry or
remove it before insertion).

I don't know.  Maybe it's OK but it seems messy even if it is.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Quick idea for reducing VACUUM contention

2007-07-26 Thread ITAGAKI Takahiro
Simon Riggs [EMAIL PROTECTED] wrote:

 Read the heap blocks in sequence, but make a conditional lock for
 cleanup on each block. If we don't get it, sleep, then try again when we
 wake up. If we fail the second time, just skip the block completely.

When we allow some skips in removing dead tuples, can we guarantee
pg_class.relfrozenxid? I think we might need additional freezing-xmax
operations to avoid XID-wraparound in the first path of vacuum, though
it hardly occurs.


It might be a future topic ... if we are in the direciton of 
optimistic sweeping, is it possible to remove the second path of vacuum
completely? We just add XID of the vacuum to dead tuples we see in the
first path. When backends find a dead tuple and see the transaction
identified by XID in it has commited, they can freely reuse the area of
the dead tuple because we can assume index entries pointing the tuple
have been removed by the vacuum. We would use the infrastructure
introduced by HOT for this purpose.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Quick idea for reducing VACUUM contention

2007-07-26 Thread Simon Riggs
Just wanted to record a quick idea in case its useful in the future.

VACUUM reads all blocks in sequence and waits on each one to acquire a
cleanup lock.

If VACUUM is running with vacuum_delay enabled then we might take a
slightly different approach:

Read the heap blocks in sequence, but make a conditional lock for
cleanup on each block. If we don't get it, sleep, then try again when we
wake up. If we fail the second time, just skip the block completely.

As long as we skip no more than 1% of the blocks we should be able to do
a very good job of cleanup, yet with reduced block contention as the
VACUUM proceeds.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly