Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-06-24 kell 19:36, kirjutas Bruce Momjian:
 Hannu Krosing wrote:
  ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane:

   
   Bottom line: there's still lots of low-hanging fruit.  Why are people
   feeling that we need to abandon or massively complicate our basic
   architecture to make progress?
  
  Maybe we could start from reusing the index tuples which point to
  invisible tuples ? The index is not MVCC anyway, so maybe it is easier
  to do in-place replacement there ?
  
  This probably has the same obstacles which have prevented us from
  removing those in the first place (removing instead of marking as
  invisible). Does it cause some locking issues ? Or does it go against
  some other constraints of our index lookups ?
  
  I think that just setting the invisible bit in an index leaf node causes
  nearly as much disk io as removing the node.
  
  If we could delete/reuse old index tuples, it would solve a sizable
  chunk of index-growth problem, especially for cases where referenced key
  value does not change.
 
 I think heap _and_ index reuse is the only useful direction.  Index or
 heap reuse alone seems too marginal for the added complexity.

Sure, but index reuse seems a lot easier, as there is nothing additional
to remember or clean out when doing it.

When reusing a heap tuple you have to clean out all index entries
pointing to it.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-25 Thread Michael Meskes
On Sat, Jun 24, 2006 at 09:45:45PM -0400, Bruce Momjian wrote:
 Michael, I saw your patch stating that the copyright was assigned to
 PGDG.  However, once that happens, we are of the policy to remove
 copyrights to individual users because it confuses things.
 
 Therefore, I have updated your applied patch to just mention the
 author's name, email address, and date.

If that suffices, fine with me. 

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Mark Woodward
 On 6/24/2006 9:23 AM, Mark Woodward wrote:

 On Sat, 24 Jun 2006, Mark Woodward wrote:

 I'm probably mistaken, but aren't there already forward references in
 tuples to later versions? If so, I'm only sugesting reversing the
 order
 and referencing the latest version.

 I thought I understood your idea, but now you lost me again. I thought
 what you want is that the older heap tuple has a pointer to the
 newer one. Which it already has, it's called t_ctid.

 Perfect!

 Can you try to explain more carefully how the whole thing would work?
 What would an index tuple point to? What pointers would a heap tuple
 have? What would an index scan do to find the row version it's
 interested
 in? What exactly would an update do?


 Since we already allocate space for some notion of linked list, then all
 I'm suggesting is we reverse the order, sort of. Currently it looks like
 this:

 ver001-ver002-ver003-...-verN

 That's what t_ctid does now, right? Well, that's sort of stupid. Why not
 have it do this:

 ver001-verN-...-ver003-ver002-|
  ^-/

 This will speed up almost *all* queries when there are more than two
 version of rows.

 OK, here is the behavior of an update:
 (1) Find the latest version of the row
 (2) Duplicate row and modify as per plan
 (3) Set the t_ctid of the new row to the last latest
 (4) Set the t_ctid of the first row to that of the new row
 (5) Attempt to index the row
 (6) If the first version of the row is in the index already (ver001)
 Don't
 modify the index, otherwise, add the new version (just as before)

 When you vacuum, simply make the latest version (verN) the key row
 (ver001).

 This isn't done simply. Currently, vacuum collects a trivial array of
 ctid's it is removing and every now and then does a bulk remove of the
 index tuples pointing to them. Now lets consider a table with two
 indexed columns with the following row versions resulting from an insert
 and 3 updates to that same row:

v1:  a,b
v2:  a,c
v3:  a,d
v4:  b,d

 In your new scheme, there would be two index tuples for column 1 (one
 pointing to v1, one pointing to v4) and 3 index tuples for column 2 (one
 for each different value pointing to v1, v2 and v3). Did I get that
 right so far?

 If vacuum now can remove v1, it has to update index 1 to point to v2 and
 remove the pointer to v1 from index 2. If it can remove v1 and v2, it
 has to update index 1 to point to v3 and remove v1 and v2 from index 2.
 If it can remove v1, v2 and v3 it must delete the index 1 tuple pointing
 to v1, delete the index 2 entries pointing to v1 and v2 and update the
 index 2 entry for v3 to point to v4. Figuring out which index tuples to
 remove and which ones to update can only be done by comparing each and
 every indexed columns old and new values. To do so, vacuum will have to
 fetch all the row versions, which can be scattered all over the place,
 with all possible locking issues including but not limited to deadlocks.

I'm not sure why vacuum can't run similarly to the way it does now.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Buffer for inner and outer table

2006-06-25 Thread Daniel Xavier de Sousa
Hi Álvaro,Thank you  about your answer.  I thought that  Postgres could management the space for outer and inner tables. Because, some  articles has showed how this make the difference.Alvaro  please,  And about  count of pages that postgres read on query inner join (on RAM and HD). Do you know  some thing about? Thanks   Daniel Alvaro Herrera [EMAIL PROTECTED] escreveu:  Daniel Xavier de Sousa wrote:   Somebody can tell me, where the postgres control the buffer for   inner and outer  table, when it execute Nest_loop_join? I would want   how to change the size this buffer  and see all statistics about   thisThere is no such buffer.  Buffers used in scans are kept inshared_buffers, just like for everything else.-- Alvaro Herrerahttp://www.CommandPrompt.com/PostgreSQL Replication, Consulting, Custom Development, 24x7 support 
		 
Yahoo! Copa 2006 - cobertura dos jogos em tempo real e tudo sobre a seleção brasileira!

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Hannu Krosing wrote:
   Maybe we could start from reusing the index tuples which point to
   invisible tuples ? The index is not MVCC anyway, so maybe it is easier
   to do in-place replacement there ?
   
   This probably has the same obstacles which have prevented us from
   removing those in the first place (removing instead of marking as
   invisible). Does it cause some locking issues ? Or does it go against
   some other constraints of our index lookups ?
   
   I think that just setting the invisible bit in an index leaf node causes
   nearly as much disk io as removing the node.
   
   If we could delete/reuse old index tuples, it would solve a sizable
   chunk of index-growth problem, especially for cases where referenced key
   value does not change.
  
  I think heap _and_ index reuse is the only useful direction.  Index or
  heap reuse alone seems too marginal for the added complexity.
 
 Sure, but index reuse seems a lot easier, as there is nothing additional
 to remember or clean out when doing it.

Yes, seems so.  TODO added:

* Reuse index tuples that point to heap tuples that are not visible to
  anyone?

 When reusing a heap tuple you have to clean out all index entries
 pointing to it.

Well, not for UPDATE for no key changes on the same page, if we do that.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum row?

2006-06-25 Thread Jonah H. Harris

On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote:

I originally suggested a methodology for preserving MVCC and everyone is
confusing it as update in place, this isnot what I intended.


Actually, you should've presented your idea as performing MVCC the way
Firebird does... the idea is basically the same.  Doing some research
never hurts... especially with this crowd.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/25/2006 6:52 AM, Mark Woodward wrote:

On 6/24/2006 9:23 AM, Mark Woodward wrote:


On Sat, 24 Jun 2006, Mark Woodward wrote:


I'm probably mistaken, but aren't there already forward references in
tuples to later versions? If so, I'm only sugesting reversing the
order
and referencing the latest version.


I thought I understood your idea, but now you lost me again. I thought
what you want is that the older heap tuple has a pointer to the
newer one. Which it already has, it's called t_ctid.


Perfect!


Can you try to explain more carefully how the whole thing would work?
What would an index tuple point to? What pointers would a heap tuple
have? What would an index scan do to find the row version it's
interested
in? What exactly would an update do?



Since we already allocate space for some notion of linked list, then all
I'm suggesting is we reverse the order, sort of. Currently it looks like
this:

ver001-ver002-ver003-...-verN

That's what t_ctid does now, right? Well, that's sort of stupid. Why not
have it do this:

ver001-verN-...-ver003-ver002-|
 ^-/

This will speed up almost *all* queries when there are more than two
version of rows.

OK, here is the behavior of an update:
(1) Find the latest version of the row
(2) Duplicate row and modify as per plan
(3) Set the t_ctid of the new row to the last latest
(4) Set the t_ctid of the first row to that of the new row
(5) Attempt to index the row
(6) If the first version of the row is in the index already (ver001)
Don't
modify the index, otherwise, add the new version (just as before)

When you vacuum, simply make the latest version (verN) the key row
(ver001).


This isn't done simply. Currently, vacuum collects a trivial array of
ctid's it is removing and every now and then does a bulk remove of the
index tuples pointing to them. Now lets consider a table with two
indexed columns with the following row versions resulting from an insert
and 3 updates to that same row:

   v1:  a,b
   v2:  a,c
   v3:  a,d
   v4:  b,d

In your new scheme, there would be two index tuples for column 1 (one
pointing to v1, one pointing to v4) and 3 index tuples for column 2 (one
for each different value pointing to v1, v2 and v3). Did I get that
right so far?

If vacuum now can remove v1, it has to update index 1 to point to v2 and
remove the pointer to v1 from index 2. If it can remove v1 and v2, it
has to update index 1 to point to v3 and remove v1 and v2 from index 2.
If it can remove v1, v2 and v3 it must delete the index 1 tuple pointing
to v1, delete the index 2 entries pointing to v1 and v2 and update the
index 2 entry for v3 to point to v4. Figuring out which index tuples to
remove and which ones to update can only be done by comparing each and
every indexed columns old and new values. To do so, vacuum will have to
fetch all the row versions, which can be scattered all over the place,
with all possible locking issues including but not limited to deadlocks.


I'm not sure why vacuum can't run similarly to the way it does now.


What's that supposed to mean?


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/25/2006 12:27 PM, Bruce Momjian wrote:


Hannu Krosing wrote:

  Maybe we could start from reusing the index tuples which point to
  invisible tuples ? The index is not MVCC anyway, so maybe it is easier
  to do in-place replacement there ?
  
  This probably has the same obstacles which have prevented us from

  removing those in the first place (removing instead of marking as
  invisible). Does it cause some locking issues ? Or does it go against
  some other constraints of our index lookups ?
  
  I think that just setting the invisible bit in an index leaf node causes

  nearly as much disk io as removing the node.
  
  If we could delete/reuse old index tuples, it would solve a sizable

  chunk of index-growth problem, especially for cases where referenced key
  value does not change.
 
 I think heap _and_ index reuse is the only useful direction.  Index or

 heap reuse alone seems too marginal for the added complexity.

Sure, but index reuse seems a lot easier, as there is nothing additional
to remember or clean out when doing it.


Yes, seems so.  TODO added:

* Reuse index tuples that point to heap tuples that are not visible to
  anyone?


When reusing a heap tuple you have to clean out all index entries
pointing to it.


Well, not for UPDATE for no key changes on the same page, if we do that.



An update that results in all the same values of every indexed column of 
a known deleted invisible tuple. This reused tuple can by definition not 
be the one currently updated. So unless it is a table without a primary 
key, this assumes that at least 3 versions of the same row exist within 
the same block. How likely is that to happen?



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/24/2006 4:10 PM, Hannu Krosing wrote:


Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck:


 That fixes the symptom, not the problem. The problem is performance
 steadily degrades over time.
 
 No, you got it backwards.  The performance degradation is the symptom.

 The problem is that there are too many dead tuples in the table.  There
 is one way to solve that problem -- remove them, which is done by
 running vacuum.

Precisely.

 There are some problems with vacuum itself, that I agree with.  For
 example it would be good if a long-running vacuum wouldn't affect a
 vacuum running in another table because of the long-running transaction
 effect it has.  It would be good if vacuum could be run partially over a
 table.  It would be good if there was a way to speed up vacuum by using
 a dead space map or something.

It would be good if vacuum wouldn't waste time on blocks that don't have 
any possible work in them. Vacuum has two main purposes. A) remove dead 
rows and B) freeze xids. Once a block has zero deleted rows and all xids 
are frozen, there is nothing to do with this block and vacuum should 
skip it until a transaction updates that block.


This requires 2 bits per block, which is 32K per 1G segment of a heap. 
Clearing the bits is done when the block is marked dirty. This way 
vacuum would not waste any time and IO on huge slow changing tables. 
That part, sequentially scanning huge tables that didn't change much is 
what keeps us from running vacuum every couple of seconds.


Seems like a plan. 


Still, there is another problem which is not solved by map approach
only, at least with current implementation of vacuum.

This is the fact that we need to do full scan over index(es) to clean up
pointers to removed tuples. And huge tables tend to have huge indexes.


Right, now that you say it I remember why this wasn't so easy as it 
sounded at the beginning.


Obviously there is no other way to find an index tuple without a 
sequential scan other than doing an index scan. So vacuum would have to 
estimate based on the bitmaps if it could be beneficial (huge table, 
little vacuumable pages) to actually remove/flag single index tuples 
before removing the heap tuple. This can be done in advance to removing 
the heap tuple because index tuples might not be there to begin with.


However, that is a very costly thing to do and not trivial to implement.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Heikki Linnakangas

On Sat, 24 Jun 2006, Bruce Momjian wrote:


Because having them be on the same page is the only way you can update
the page item pointer so when you recycle the row, you the indexes are
now pointing to the new version.  Pages look like:

[marker][item1][item2][item3]...[tuple1][tuple2][tuple3]

and indexes only point to items, not to tuples.  This allows tuples to
be compacted on the page without affecting the indexes.

If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
backends, you can modify item1 to point to tuple2, and you can mark the
space used by tuple1 as reusable:

[marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]


Ok, now I think I get it. So the limitation of old and new tuple being on 
the same page is required to make it possible to remove the old tuple 
without touching the indexes?


If you move the new tuple (logically, by modifying item pointers) on 
vacuum, isn't there a risk that a concurrent seqscan misses it?



If you can't expire the old row because one of the indexed columns was
modified, I see no reason to try to reduce the additional index entries.


It won't enable early expiration, but it means less work to do on update. 
If there's a lot of indexes, not having to add so many index tuples can be 
a significant saving.


To summarise, we have two issues related to frequent updates:
1. Index and heap bloat, requiring frequent vacuum.
2. Updates are more expensive than on other DBMSs, because we have to add 
a new index tuple in every index, even if none of the indexed columns are 
modified.


Tom suggested that we just improve vacuum and autovacuum, and someone 
brought up the dead space map idea again. Those are all worthwhile things 
to do and help with vacuuming after deletes as well as updates, but they 
only address issue 1. Mark's suggestion (assuming that it would've 
worked) as well as yours address both, but only for updates.


- Heikki

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Jan Wieck wrote:
  Sure, but index reuse seems a lot easier, as there is nothing additional
  to remember or clean out when doing it.
  
  Yes, seems so.  TODO added:
  
  * Reuse index tuples that point to heap tuples that are not visible to
anyone?
  
  When reusing a heap tuple you have to clean out all index entries
  pointing to it.
  
  Well, not for UPDATE for no key changes on the same page, if we do that.
  
 
 An update that results in all the same values of every indexed column of 
 a known deleted invisible tuple. This reused tuple can by definition not 
 be the one currently updated. So unless it is a table without a primary 
 key, this assumes that at least 3 versions of the same row exist within 
 the same block. How likely is that to happen?

Good question.  You take the current tuple, and make another one on the
same page.  Later, an update can reuse the original tuple if it is no
longer visible to anyone (by changing the item id), so you only need two
tuples, not three.  My hope is that a repeated update would eventually
move to a page that enough free space for two (or more) versions.

Does that help explain it?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Heikki Linnakangas wrote:
 On Sat, 24 Jun 2006, Bruce Momjian wrote:
 
  Because having them be on the same page is the only way you can update
  the page item pointer so when you recycle the row, you the indexes are
  now pointing to the new version.  Pages look like:
 
  [marker][item1][item2][item3]...[tuple1][tuple2][tuple3]
 
  and indexes only point to items, not to tuples.  This allows tuples to
  be compacted on the page without affecting the indexes.
 
  If tuple1 is updated to tuple2, once tuple1 is no longer visible to any
  backends, you can modify item1 to point to tuple2, and you can mark the
  space used by tuple1 as reusable:
 
  [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3]
 
 Ok, now I think I get it. So the limitation of old and new tuple being on 
 the same page is required to make it possible to remove the old tuple 
 without touching the indexes?

Yes, modifying the heap page item pointer is required for reuse.

 If you move the new tuple (logically, by modifying item pointers) on 
 vacuum, isn't there a risk that a concurrent seqscan misses it?

Well, you lock the page while updating the item pointer.  Because the
versions are on the same page, a single page lock should be fine.

  If you can't expire the old row because one of the indexed columns was
  modified, I see no reason to try to reduce the additional index entries.
 
 It won't enable early expiration, but it means less work to do on update. 
 If there's a lot of indexes, not having to add so many index tuples can be 
 a significant saving.

Already added to TODO.

* Reuse index tuples that point to heap tuples that are not visible to
  anyone?

 To summarise, we have two issues related to frequent updates:
 1. Index and heap bloat, requiring frequent vacuum.
 2. Updates are more expensive than on other DBMSs, because we have to add 
 a new index tuple in every index, even if none of the indexed columns are 
 modified.
 
 Tom suggested that we just improve vacuum and autovacuum, and someone 
 brought up the dead space map idea again. Those are all worthwhile things 
 to do and help with vacuuming after deletes as well as updates, but they 
 only address issue 1. Mark's suggestion (assuming that it would've 
 worked) as well as yours address both, but only for updates.

Agreed.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/25/2006 2:24 PM, Bruce Momjian wrote:


Jan Wieck wrote:

 Sure, but index reuse seems a lot easier, as there is nothing additional
 to remember or clean out when doing it.
 
 Yes, seems so.  TODO added:
 
 	* Reuse index tuples that point to heap tuples that are not visible to

  anyone?
 
 When reusing a heap tuple you have to clean out all index entries

 pointing to it.
 
 Well, not for UPDATE for no key changes on the same page, if we do that.
 

An update that results in all the same values of every indexed column of 
a known deleted invisible tuple. This reused tuple can by definition not 
be the one currently updated. So unless it is a table without a primary 
key, this assumes that at least 3 versions of the same row exist within 
the same block. How likely is that to happen?


Good question.  You take the current tuple, and make another one on the
same page.  Later, an update can reuse the original tuple if it is no
longer visible to anyone (by changing the item id), so you only need two
tuples, not three.  My hope is that a repeated update would eventually
move to a page that enough free space for two (or more) versions.

Does that help explain it?



That's exactly what I meant. You need space for 3 or more tuple versions 
within one page and the luck that one of them is invisible at the time 
of the update. I don't know how likely or unlikely this is in reality, 
but it doesn't sound very promising to me so far.


Another problem with this is that even if you find such row, it doesn't 
spare you the index traversal. The dead row whos item id you're reusing 
might have resulted from an insert that aborted or crashed before it 
finished creating all index entries. Or some of its index entries might 
already be flagged known dead, and you better reset those flags.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian:
 Jan Wieck wrote:
   Sure, but index reuse seems a lot easier, as there is nothing additional
   to remember or clean out when doing it.
   
   Yes, seems so.  TODO added:
   
 * Reuse index tuples that point to heap tuples that are not visible to
   anyone?
   
   When reusing a heap tuple you have to clean out all index entries
   pointing to it.
   
   Well, not for UPDATE for no key changes on the same page, if we do that.
   
  
  An update that results in all the same values of every indexed column of 
  a known deleted invisible tuple. This reused tuple can by definition not 
  be the one currently updated. So unless it is a table without a primary 
  key, this assumes that at least 3 versions of the same row exist within 
  the same block. How likely is that to happen?
 
 Good question.  You take the current tuple, and make another one on the
 same page.  Later, an update can reuse the original tuple if it is no
 longer visible to anyone (by changing the item id), so you only need two
 tuples, not three.  My hope is that a repeated update would eventually
 move to a page that enough free space for two (or more) versions.

I can confirm that this is exactly what happens when running an
update-heavy load with frequent vacuums. Eventually most rows get their
own db pages or share the same page with 2-3 rows. And there will be
lots of unused (filed up, or cleaned and not yet reused) pages.

The overall performance could be made a little better by tuning the
system to not put more than N new rows on the same page at initial
insert or when the row move to a new page during update. Currently
several new rows are initially put on the same page and then move around
during repeated updates until they slow(ish)ly claim their own page.

 Does that help explain it?
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-06-25 kell 06:52, kirjutas Mark Woodward:

 I'm not sure why vacuum can't run similarly to the way it does now.

What do you mean ?

Currently vacuum runs a three-step process

1) runs a full scan over heap and collects all dead tuple ctids from
heap

2) run full scans over all indexes of the relation and removes any
pointers pointing to dead tuples.

3) runs another full scan over heap and removes the tuples in the list
collected at step 1.

There is no modifications done to live tuples (ok, they *may* get frozen
if they are above certain age, but this is not relevant to current
discussion).

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Jan Wieck wrote:
  An update that results in all the same values of every indexed column of 
  a known deleted invisible tuple. This reused tuple can by definition not 
  be the one currently updated. So unless it is a table without a primary 
  key, this assumes that at least 3 versions of the same row exist within 
  the same block. How likely is that to happen?
  
  Good question.  You take the current tuple, and make another one on the
  same page.  Later, an update can reuse the original tuple if it is no
  longer visible to anyone (by changing the item id), so you only need two
  tuples, not three.  My hope is that a repeated update would eventually
  move to a page that enough free space for two (or more) versions.
  
  Does that help explain it?
  
 
 That's exactly what I meant. You need space for 3 or more tuple versions 
 within one page and the luck that one of them is invisible at the time 
 of the update. I don't know how likely or unlikely this is in reality, 
 but it doesn't sound very promising to me so far.

Why three?  I explained using only two heap tuples:

[item1]...[tuple1]

becomes on UPDATE:
   --
[item1]...[tuple1][tuple2]
  -

on another UPDATE, if tuple1 is no longer visible:

   --
[item1]...[tuple1][tuple2]
  --

 Another problem with this is that even if you find such row, it doesn't 
 spare you the index traversal. The dead row whos item id you're reusing 
 might have resulted from an insert that aborted or crashed before it 
 finished creating all index entries. Or some of its index entries might 
 already be flagged known dead, and you better reset those flags.

You can only reuse heap rows that were created and expired by committed
transactions.  In fact, you can only UPDATE a row that was created by a
committed transaction.  You cannot _reuse_ any row, but only a row that
is being UPDATEd.  Also, it cannot be known dead because it are are in
the process of updating it.

I am thinking my idea was not fully understood.  Hopefully this email
helps.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Hannu Krosing wrote:
 ?hel kenal p?eval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian:
  Jan Wieck wrote:
Sure, but index reuse seems a lot easier, as there is nothing 
additional
to remember or clean out when doing it.

Yes, seems so.  TODO added:

* Reuse index tuples that point to heap tuples that are not 
visible to
  anyone?

When reusing a heap tuple you have to clean out all index entries
pointing to it.

Well, not for UPDATE for no key changes on the same page, if we do that.

   
   An update that results in all the same values of every indexed column of 
   a known deleted invisible tuple. This reused tuple can by definition not 
   be the one currently updated. So unless it is a table without a primary 
   key, this assumes that at least 3 versions of the same row exist within 
   the same block. How likely is that to happen?
  
  Good question.  You take the current tuple, and make another one on the
  same page.  Later, an update can reuse the original tuple if it is no
  longer visible to anyone (by changing the item id), so you only need two
  tuples, not three.  My hope is that a repeated update would eventually
  move to a page that enough free space for two (or more) versions.
 
 I can confirm that this is exactly what happens when running an
 update-heavy load with frequent vacuums. Eventually most rows get their
 own db pages or share the same page with 2-3 rows. And there will be
 lots of unused (filed up, or cleaned and not yet reused) pages.

Right, that was my guess because heavily updated rows start to move
around in the table, and because UPDATE tries to stay on the same page,
once it the row hits a mostly-empty page, it stays there.

 The overall performance could be made a little better by tuning the
 system to not put more than N new rows on the same page at initial
 insert or when the row move to a new page during update. Currently
 several new rows are initially put on the same page and then move around
 during repeated updates until they slow(ish)ly claim their own page.

We have a fillfactor patch that will be in 8.2.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] xlog viewer proposal

2006-06-25 Thread Diogo Biazus
Alright, I'm working on a fast prototype using the SRF.On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
On Fri, 2006-06-23 at 10:59 -0300, Diogo Biazus wrote: On 6/23/06, Simon Riggs 
[EMAIL PROTECTED] wrote:  - give more flexibility for managing the xlogs remotely Not sure what you mean.  - I think it's faster to implement and to have a working and
 usable  tool. Why do you think that? It sounds like you've got more work since you effectively need to rewrite the _desc routines.
 Yes, but I don't need to worry with program output, and I have the backend's memory management and error handling.I'd suggest doing a quick prototype to allow us to evaluate whicharchitecture would be preferable.
I'm torn between the good-idea and the safe-minimal-but-definitely in8.2 option.--Simon RiggsEnterpriseDB http://www.enterprisedb.com
-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br


[HACKERS] Table clustering idea

2006-06-25 Thread Dawid Kuroczko
There is a well known command called CLUSTER which organizes tablein specified index's order. It has a drawback, that new tuples added arenot in this order. Last night I had idea which could be interesting, I hope.
The idea is to make use of 'histogram_bounds' collected statistical data.Instead of inserting row into first suitable spot in a table, a table wouldbe divided into sections, one for each of histogram_bounds ranges.
When inserting, the database would try to find most suitable sectionto insert (using the histogram_bounds), and if there were free spotsthere, would insert there. If not, it would either look for a tuple in nearby
sections, or first suitable place.What would it do? It would try to keep table somewhat organized,keeping rows of similar values close together (within SET STATISTICSresolution, so a common scenario would be 50 or 100 sections).
It would make it a bit hard for a table to shrink (since new rows wouldbe added throughout the table, not at the beginning).Other idea than using histogram_bounds would be using the positionof key inside the index to determine the ideal place of row inside
the table and find the closest free spot there. This would be of coursemuch more precise and wouldn't rely on statistic. Regards, Dawid


Re: [HACKERS] Table clustering idea

2006-06-25 Thread Luke Lonergan
Dawid, 

 Other idea than using histogram_bounds would be using the 
 position of key inside the index to determine the ideal 
 place of row inside the table and find the closest free spot 
 there. This would be of course much more precise and wouldn't 
 rely on statistic.

This is generally known as index organized tables in other databases.
It implements a clustered storage of the table, which dramatically
speeds access on the chosen indexed column and makes inserts fast.

This also eases some of the visibility/MVCC implementation issues being
discussed on hackers, but does not help with the maintenance of
non-storage key indices.

Other DBMS have index organized tables that can use either hash or btree
organizations, both of which have their uses.  We are planning to
implement btree organized tables sometime - anyone else interested in
this idea?

- Luke


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Table clustering idea

2006-06-25 Thread Josh Berkus
Luke,

 Other DBMS have index organized tables that can use either hash or btree
 organizations, both of which have their uses.  We are planning to
 implement btree organized tables sometime - anyone else interested in
 this idea?

Search the archives.  It's been dicussed on this list at least twice 
before, that I know of.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
bruce wrote:
 Why three?  I explained using only two heap tuples:
 
   [item1]...[tuple1]
 
 becomes on UPDATE:
--
   [item1]...[tuple1][tuple2]
   -
 
 on another UPDATE, if tuple1 is no longer visible:
 
--
   [item1]...[tuple1][tuple2]
   --
 

Here is some pseudo-code that implements this:

  Definition:  Single-Index-Tuple Chain (CITC)

  Do old and new UPDATE rows have unchanged indexed columns?
Is old row member of CITC?
Point item id at first CITC visible tuple in chain
Mark previous invisible CITC tuples as freespace

Does page have free space?
Add new tuple on the same page as old
Mark old tuple as CITC
Do not create index entries

VACUUM would have to be taught about CITC, and CREATE INDEX would have
to create entries in other indexes for cases where its new indexed
columns change inside a CITC.

Conceptually, CITC allows a single index entry to point to multiple
UPDATEd rows, allowing non-visible tuples to be recycled (and reused for
future UPDATEs) without affecting the index.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/25/2006 5:18 PM, Bruce Momjian wrote:


Jan Wieck wrote:
 An update that results in all the same values of every indexed column of 
 a known deleted invisible tuple. This reused tuple can by definition not 
 be the one currently updated. So unless it is a table without a primary 
 key, this assumes that at least 3 versions of the same row exist within 
 the same block. How likely is that to happen?
 
 Good question.  You take the current tuple, and make another one on the

 same page.  Later, an update can reuse the original tuple if it is no
 longer visible to anyone (by changing the item id), so you only need two
 tuples, not three.  My hope is that a repeated update would eventually
 move to a page that enough free space for two (or more) versions.
 
 Does that help explain it?
 

That's exactly what I meant. You need space for 3 or more tuple versions 
within one page and the luck that one of them is invisible at the time 
of the update. I don't know how likely or unlikely this is in reality, 
but it doesn't sound very promising to me so far.


Why three?  I explained using only two heap tuples:


For some reason I counted in the new tuple ... sorry that. Yes, it can 
work with two tuples.




[item1]...[tuple1]

becomes on UPDATE:
   --
[item1]...[tuple1][tuple2]
  -

on another UPDATE, if tuple1 is no longer visible:

   --
[item1]...[tuple1][tuple2]
  --

Another problem with this is that even if you find such row, it doesn't 
spare you the index traversal. The dead row whos item id you're reusing 
might have resulted from an insert that aborted or crashed before it 
finished creating all index entries. Or some of its index entries might 
already be flagged known dead, and you better reset those flags.


You can only reuse heap rows that were created and expired by committed
transactions.  In fact, you can only UPDATE a row that was created by a
committed transaction.  You cannot _reuse_ any row, but only a row that
is being UPDATEd.  Also, it cannot be known dead because it are are in
the process of updating it.


Now you lost me. What do you mean a row that is being UPDATEd? The row 
(version) being UPDATEd right now cannot be expired, or why would you 
update that one? And if your transaction rolls back later, the row you 
update right now must be the one surviving.


Any row that was created by a committed transaction does indeed have all 
the index entries created. But if it is deleted and expired, that means 
that the transaction that stamped xmax has committed and is outside of 
every existing snapshot. You can only reuse a slot that is used by a 
tuple that satisfies the vacuum snapshot. And a tuple that satisfies 
that snapshot has potentially index entries flagged known dead.



I am thinking my idea was not fully understood.  Hopefully this email
helps.


I must be missing something because I still don't see how it can work.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Bruce Momjian
Jan Wieck wrote:
  [item1]...[tuple1]
  
  becomes on UPDATE:
 --
  [item1]...[tuple1][tuple2]
-
  
  on another UPDATE, if tuple1 is no longer visible:
  
 --
  [item1]...[tuple1][tuple2]
--
  
  Another problem with this is that even if you find such row, it doesn't 
  spare you the index traversal. The dead row whos item id you're reusing 
  might have resulted from an insert that aborted or crashed before it 
  finished creating all index entries. Or some of its index entries might 
  already be flagged known dead, and you better reset those flags.
  
  You can only reuse heap rows that were created and expired by committed
  transactions.  In fact, you can only UPDATE a row that was created by a
  committed transaction.  You cannot _reuse_ any row, but only a row that
  is being UPDATEd.  Also, it cannot be known dead because it are are in
  the process of updating it.
 
 Now you lost me. What do you mean a row that is being UPDATEd? The row 
 (version) being UPDATEd right now cannot be expired, or why would you 
 update that one? And if your transaction rolls back later, the row you 
 update right now must be the one surviving.

It can only be a non-visible version of the row earlier in the UPDATE
chain, not the actual one being updated.

 Any row that was created by a committed transaction does indeed have all 
 the index entries created. But if it is deleted and expired, that means 
 that the transaction that stamped xmax has committed and is outside of 
 every existing snapshot. You can only reuse a slot that is used by a 
 tuple that satisfies the vacuum snapshot. And a tuple that satisfies 
 that snapshot has potentially index entries flagged known dead.

When you are using the update chaining, you can't mark that index row as
dead because it actually points to more than one row on the page, some
are non-visible, some are visible.

  I am thinking my idea was not fully understood.  Hopefully this email
  helps.
 
 I must be missing something because I still don't see how it can work.

I just posted pseudo-code.  Hope that helps.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Jan Wieck

On 6/25/2006 10:12 PM, Bruce Momjian wrote:

When you are using the update chaining, you can't mark that index row as
dead because it actually points to more than one row on the page, some
are non-visible, some are visible.


Back up the truck ... you mean in the current code base we have heap 
tuples that are visible in index scans because of heap tuple chaining 
but without index tuples pointing directly at them?



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-25 Thread Alvaro Herrera
Jan Wieck wrote:
 On 6/25/2006 10:12 PM, Bruce Momjian wrote:
 When you are using the update chaining, you can't mark that index row as
 dead because it actually points to more than one row on the page, some
 are non-visible, some are visible.
 
 Back up the truck ... you mean in the current code base we have heap 
 tuples that are visible in index scans because of heap tuple chaining 
 but without index tuples pointing directly at them?

I don't know where this idea came from, but it's not true.  All heap
tuples, dead or otherwise, have index entries.  Unless the idea is to
extend update chaining to mean something different from the current
meaning.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly