Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, L, 2006-06-24 kell 19:36, kirjutas Bruce Momjian: Hannu Krosing wrote: ?hel kenal p?eval, R, 2006-06-23 kell 13:08, kirjutas Tom Lane: Bottom line: there's still lots of low-hanging fruit. Why are people feeling that we need to abandon or massively complicate our basic architecture to make progress? Maybe we could start from reusing the index tuples which point to invisible tuples ? The index is not MVCC anyway, so maybe it is easier to do in-place replacement there ? This probably has the same obstacles which have prevented us from removing those in the first place (removing instead of marking as invisible). Does it cause some locking issues ? Or does it go against some other constraints of our index lookups ? I think that just setting the invisible bit in an index leaf node causes nearly as much disk io as removing the node. If we could delete/reuse old index tuples, it would solve a sizable chunk of index-growth problem, especially for cases where referenced key value does not change. I think heap _and_ index reuse is the only useful direction. Index or heap reuse alone seems too marginal for the added complexity. Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. When reusing a heap tuple you have to clean out all index entries pointing to it. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [CORE] GPL Source and Copyright Questions
On Sat, Jun 24, 2006 at 09:45:45PM -0400, Bruce Momjian wrote: Michael, I saw your patch stating that the copyright was assigned to PGDG. However, once that happens, we are of the policy to remove copyrights to individual users because it confuses things. Therefore, I have updated your applied patch to just mention the author's name, email address, and date. If that suffices, fine with me. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/2006 9:23 AM, Mark Woodward wrote: On Sat, 24 Jun 2006, Mark Woodward wrote: I'm probably mistaken, but aren't there already forward references in tuples to later versions? If so, I'm only sugesting reversing the order and referencing the latest version. I thought I understood your idea, but now you lost me again. I thought what you want is that the older heap tuple has a pointer to the newer one. Which it already has, it's called t_ctid. Perfect! Can you try to explain more carefully how the whole thing would work? What would an index tuple point to? What pointers would a heap tuple have? What would an index scan do to find the row version it's interested in? What exactly would an update do? Since we already allocate space for some notion of linked list, then all I'm suggesting is we reverse the order, sort of. Currently it looks like this: ver001-ver002-ver003-...-verN That's what t_ctid does now, right? Well, that's sort of stupid. Why not have it do this: ver001-verN-...-ver003-ver002-| ^-/ This will speed up almost *all* queries when there are more than two version of rows. OK, here is the behavior of an update: (1) Find the latest version of the row (2) Duplicate row and modify as per plan (3) Set the t_ctid of the new row to the last latest (4) Set the t_ctid of the first row to that of the new row (5) Attempt to index the row (6) If the first version of the row is in the index already (ver001) Don't modify the index, otherwise, add the new version (just as before) When you vacuum, simply make the latest version (verN) the key row (ver001). This isn't done simply. Currently, vacuum collects a trivial array of ctid's it is removing and every now and then does a bulk remove of the index tuples pointing to them. Now lets consider a table with two indexed columns with the following row versions resulting from an insert and 3 updates to that same row: v1: a,b v2: a,c v3: a,d v4: b,d In your new scheme, there would be two index tuples for column 1 (one pointing to v1, one pointing to v4) and 3 index tuples for column 2 (one for each different value pointing to v1, v2 and v3). Did I get that right so far? If vacuum now can remove v1, it has to update index 1 to point to v2 and remove the pointer to v1 from index 2. If it can remove v1 and v2, it has to update index 1 to point to v3 and remove v1 and v2 from index 2. If it can remove v1, v2 and v3 it must delete the index 1 tuple pointing to v1, delete the index 2 entries pointing to v1 and v2 and update the index 2 entry for v3 to point to v4. Figuring out which index tuples to remove and which ones to update can only be done by comparing each and every indexed columns old and new values. To do so, vacuum will have to fetch all the row versions, which can be scattered all over the place, with all possible locking issues including but not limited to deadlocks. I'm not sure why vacuum can't run similarly to the way it does now. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Buffer for inner and outer table
Hi Álvaro,Thank you about your answer. I thought that Postgres could management the space for outer and inner tables. Because, some articles has showed how this make the difference.Alvaro please, And about count of pages that postgres read on query inner join (on RAM and HD). Do you know some thing about? Thanks Daniel Alvaro Herrera [EMAIL PROTECTED] escreveu: Daniel Xavier de Sousa wrote: Somebody can tell me, where the postgres control the buffer for inner and outer table, when it execute Nest_loop_join? I would want how to change the size this buffer and see all statistics about thisThere is no such buffer. Buffers used in scans are kept inshared_buffers, just like for everything else.-- Alvaro Herrerahttp://www.CommandPrompt.com/PostgreSQL Replication, Consulting, Custom Development, 24x7 support Yahoo! Copa 2006 - cobertura dos jogos em tempo real e tudo sobre a seleção brasileira!
Re: [HACKERS] vacuum, performance, and MVCC
Hannu Krosing wrote: Maybe we could start from reusing the index tuples which point to invisible tuples ? The index is not MVCC anyway, so maybe it is easier to do in-place replacement there ? This probably has the same obstacles which have prevented us from removing those in the first place (removing instead of marking as invisible). Does it cause some locking issues ? Or does it go against some other constraints of our index lookups ? I think that just setting the invisible bit in an index leaf node causes nearly as much disk io as removing the node. If we could delete/reuse old index tuples, it would solve a sizable chunk of index-growth problem, especially for cases where referenced key value does not change. I think heap _and_ index reuse is the only useful direction. Index or heap reuse alone seems too marginal for the added complexity. Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum row?
On 6/24/06, Mark Woodward [EMAIL PROTECTED] wrote: I originally suggested a methodology for preserving MVCC and everyone is confusing it as update in place, this isnot what I intended. Actually, you should've presented your idea as performing MVCC the way Firebird does... the idea is basically the same. Doing some research never hurts... especially with this crowd. -- Jonah H. Harris, Software Architect | phone: 732.331.1300 EnterpriseDB Corporation| fax: 732.331.1301 33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED] Iselin, New Jersey 08830| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
On 6/25/2006 6:52 AM, Mark Woodward wrote: On 6/24/2006 9:23 AM, Mark Woodward wrote: On Sat, 24 Jun 2006, Mark Woodward wrote: I'm probably mistaken, but aren't there already forward references in tuples to later versions? If so, I'm only sugesting reversing the order and referencing the latest version. I thought I understood your idea, but now you lost me again. I thought what you want is that the older heap tuple has a pointer to the newer one. Which it already has, it's called t_ctid. Perfect! Can you try to explain more carefully how the whole thing would work? What would an index tuple point to? What pointers would a heap tuple have? What would an index scan do to find the row version it's interested in? What exactly would an update do? Since we already allocate space for some notion of linked list, then all I'm suggesting is we reverse the order, sort of. Currently it looks like this: ver001-ver002-ver003-...-verN That's what t_ctid does now, right? Well, that's sort of stupid. Why not have it do this: ver001-verN-...-ver003-ver002-| ^-/ This will speed up almost *all* queries when there are more than two version of rows. OK, here is the behavior of an update: (1) Find the latest version of the row (2) Duplicate row and modify as per plan (3) Set the t_ctid of the new row to the last latest (4) Set the t_ctid of the first row to that of the new row (5) Attempt to index the row (6) If the first version of the row is in the index already (ver001) Don't modify the index, otherwise, add the new version (just as before) When you vacuum, simply make the latest version (verN) the key row (ver001). This isn't done simply. Currently, vacuum collects a trivial array of ctid's it is removing and every now and then does a bulk remove of the index tuples pointing to them. Now lets consider a table with two indexed columns with the following row versions resulting from an insert and 3 updates to that same row: v1: a,b v2: a,c v3: a,d v4: b,d In your new scheme, there would be two index tuples for column 1 (one pointing to v1, one pointing to v4) and 3 index tuples for column 2 (one for each different value pointing to v1, v2 and v3). Did I get that right so far? If vacuum now can remove v1, it has to update index 1 to point to v2 and remove the pointer to v1 from index 2. If it can remove v1 and v2, it has to update index 1 to point to v3 and remove v1 and v2 from index 2. If it can remove v1, v2 and v3 it must delete the index 1 tuple pointing to v1, delete the index 2 entries pointing to v1 and v2 and update the index 2 entry for v3 to point to v4. Figuring out which index tuples to remove and which ones to update can only be done by comparing each and every indexed columns old and new values. To do so, vacuum will have to fetch all the row versions, which can be scattered all over the place, with all possible locking issues including but not limited to deadlocks. I'm not sure why vacuum can't run similarly to the way it does now. What's that supposed to mean? Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
On 6/25/2006 12:27 PM, Bruce Momjian wrote: Hannu Krosing wrote: Maybe we could start from reusing the index tuples which point to invisible tuples ? The index is not MVCC anyway, so maybe it is easier to do in-place replacement there ? This probably has the same obstacles which have prevented us from removing those in the first place (removing instead of marking as invisible). Does it cause some locking issues ? Or does it go against some other constraints of our index lookups ? I think that just setting the invisible bit in an index leaf node causes nearly as much disk io as removing the node. If we could delete/reuse old index tuples, it would solve a sizable chunk of index-growth problem, especially for cases where referenced key value does not change. I think heap _and_ index reuse is the only useful direction. Index or heap reuse alone seems too marginal for the added complexity. Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
On 6/24/2006 4:10 PM, Hannu Krosing wrote: Ühel kenal päeval, L, 2006-06-24 kell 15:44, kirjutas Jan Wieck: That fixes the symptom, not the problem. The problem is performance steadily degrades over time. No, you got it backwards. The performance degradation is the symptom. The problem is that there are too many dead tuples in the table. There is one way to solve that problem -- remove them, which is done by running vacuum. Precisely. There are some problems with vacuum itself, that I agree with. For example it would be good if a long-running vacuum wouldn't affect a vacuum running in another table because of the long-running transaction effect it has. It would be good if vacuum could be run partially over a table. It would be good if there was a way to speed up vacuum by using a dead space map or something. It would be good if vacuum wouldn't waste time on blocks that don't have any possible work in them. Vacuum has two main purposes. A) remove dead rows and B) freeze xids. Once a block has zero deleted rows and all xids are frozen, there is nothing to do with this block and vacuum should skip it until a transaction updates that block. This requires 2 bits per block, which is 32K per 1G segment of a heap. Clearing the bits is done when the block is marked dirty. This way vacuum would not waste any time and IO on huge slow changing tables. That part, sequentially scanning huge tables that didn't change much is what keeps us from running vacuum every couple of seconds. Seems like a plan. Still, there is another problem which is not solved by map approach only, at least with current implementation of vacuum. This is the fact that we need to do full scan over index(es) to clean up pointers to removed tuples. And huge tables tend to have huge indexes. Right, now that you say it I remember why this wasn't so easy as it sounded at the beginning. Obviously there is no other way to find an index tuple without a sequential scan other than doing an index scan. So vacuum would have to estimate based on the bitmaps if it could be beneficial (huge table, little vacuumable pages) to actually remove/flag single index tuples before removing the heap tuple. This can be done in advance to removing the heap tuple because index tuples might not be there to begin with. However, that is a very costly thing to do and not trivial to implement. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
On Sat, 24 Jun 2006, Bruce Momjian wrote: Because having them be on the same page is the only way you can update the page item pointer so when you recycle the row, you the indexes are now pointing to the new version. Pages look like: [marker][item1][item2][item3]...[tuple1][tuple2][tuple3] and indexes only point to items, not to tuples. This allows tuples to be compacted on the page without affecting the indexes. If tuple1 is updated to tuple2, once tuple1 is no longer visible to any backends, you can modify item1 to point to tuple2, and you can mark the space used by tuple1 as reusable: [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3] Ok, now I think I get it. So the limitation of old and new tuple being on the same page is required to make it possible to remove the old tuple without touching the indexes? If you move the new tuple (logically, by modifying item pointers) on vacuum, isn't there a risk that a concurrent seqscan misses it? If you can't expire the old row because one of the indexed columns was modified, I see no reason to try to reduce the additional index entries. It won't enable early expiration, but it means less work to do on update. If there's a lot of indexes, not having to add so many index tuples can be a significant saving. To summarise, we have two issues related to frequent updates: 1. Index and heap bloat, requiring frequent vacuum. 2. Updates are more expensive than on other DBMSs, because we have to add a new index tuple in every index, even if none of the indexed columns are modified. Tom suggested that we just improve vacuum and autovacuum, and someone brought up the dead space map idea again. Those are all worthwhile things to do and help with vacuuming after deletes as well as updates, but they only address issue 1. Mark's suggestion (assuming that it would've worked) as well as yours address both, but only for updates. - Heikki ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Jan Wieck wrote: Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. Does that help explain it? -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
Heikki Linnakangas wrote: On Sat, 24 Jun 2006, Bruce Momjian wrote: Because having them be on the same page is the only way you can update the page item pointer so when you recycle the row, you the indexes are now pointing to the new version. Pages look like: [marker][item1][item2][item3]...[tuple1][tuple2][tuple3] and indexes only point to items, not to tuples. This allows tuples to be compacted on the page without affecting the indexes. If tuple1 is updated to tuple2, once tuple1 is no longer visible to any backends, you can modify item1 to point to tuple2, and you can mark the space used by tuple1 as reusable: [marker][item1(tuple2)][item2][item3]...[free][tuple2][tuple3] Ok, now I think I get it. So the limitation of old and new tuple being on the same page is required to make it possible to remove the old tuple without touching the indexes? Yes, modifying the heap page item pointer is required for reuse. If you move the new tuple (logically, by modifying item pointers) on vacuum, isn't there a risk that a concurrent seqscan misses it? Well, you lock the page while updating the item pointer. Because the versions are on the same page, a single page lock should be fine. If you can't expire the old row because one of the indexed columns was modified, I see no reason to try to reduce the additional index entries. It won't enable early expiration, but it means less work to do on update. If there's a lot of indexes, not having to add so many index tuples can be a significant saving. Already added to TODO. * Reuse index tuples that point to heap tuples that are not visible to anyone? To summarise, we have two issues related to frequent updates: 1. Index and heap bloat, requiring frequent vacuum. 2. Updates are more expensive than on other DBMSs, because we have to add a new index tuple in every index, even if none of the indexed columns are modified. Tom suggested that we just improve vacuum and autovacuum, and someone brought up the dead space map idea again. Those are all worthwhile things to do and help with vacuuming after deletes as well as updates, but they only address issue 1. Mark's suggestion (assuming that it would've worked) as well as yours address both, but only for updates. Agreed. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] vacuum, performance, and MVCC
On 6/25/2006 2:24 PM, Bruce Momjian wrote: Jan Wieck wrote: Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. Does that help explain it? That's exactly what I meant. You need space for 3 or more tuple versions within one page and the luck that one of them is invisible at the time of the update. I don't know how likely or unlikely this is in reality, but it doesn't sound very promising to me so far. Another problem with this is that even if you find such row, it doesn't spare you the index traversal. The dead row whos item id you're reusing might have resulted from an insert that aborted or crashed before it finished creating all index entries. Or some of its index entries might already be flagged known dead, and you better reset those flags. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian: Jan Wieck wrote: Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. I can confirm that this is exactly what happens when running an update-heavy load with frequent vacuums. Eventually most rows get their own db pages or share the same page with 2-3 rows. And there will be lots of unused (filed up, or cleaned and not yet reused) pages. The overall performance could be made a little better by tuning the system to not put more than N new rows on the same page at initial insert or when the row move to a new page during update. Currently several new rows are initially put on the same page and then move around during repeated updates until they slow(ish)ly claim their own page. Does that help explain it? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Ühel kenal päeval, P, 2006-06-25 kell 06:52, kirjutas Mark Woodward: I'm not sure why vacuum can't run similarly to the way it does now. What do you mean ? Currently vacuum runs a three-step process 1) runs a full scan over heap and collects all dead tuple ctids from heap 2) run full scans over all indexes of the relation and removes any pointers pointing to dead tuples. 3) runs another full scan over heap and removes the tuples in the list collected at step 1. There is no modifications done to live tuples (ok, they *may* get frozen if they are above certain age, but this is not relevant to current discussion). -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] vacuum, performance, and MVCC
Jan Wieck wrote: An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. Does that help explain it? That's exactly what I meant. You need space for 3 or more tuple versions within one page and the luck that one of them is invisible at the time of the update. I don't know how likely or unlikely this is in reality, but it doesn't sound very promising to me so far. Why three? I explained using only two heap tuples: [item1]...[tuple1] becomes on UPDATE: -- [item1]...[tuple1][tuple2] - on another UPDATE, if tuple1 is no longer visible: -- [item1]...[tuple1][tuple2] -- Another problem with this is that even if you find such row, it doesn't spare you the index traversal. The dead row whos item id you're reusing might have resulted from an insert that aborted or crashed before it finished creating all index entries. Or some of its index entries might already be flagged known dead, and you better reset those flags. You can only reuse heap rows that were created and expired by committed transactions. In fact, you can only UPDATE a row that was created by a committed transaction. You cannot _reuse_ any row, but only a row that is being UPDATEd. Also, it cannot be known dead because it are are in the process of updating it. I am thinking my idea was not fully understood. Hopefully this email helps. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Hannu Krosing wrote: ?hel kenal p?eval, P, 2006-06-25 kell 14:24, kirjutas Bruce Momjian: Jan Wieck wrote: Sure, but index reuse seems a lot easier, as there is nothing additional to remember or clean out when doing it. Yes, seems so. TODO added: * Reuse index tuples that point to heap tuples that are not visible to anyone? When reusing a heap tuple you have to clean out all index entries pointing to it. Well, not for UPDATE for no key changes on the same page, if we do that. An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. I can confirm that this is exactly what happens when running an update-heavy load with frequent vacuums. Eventually most rows get their own db pages or share the same page with 2-3 rows. And there will be lots of unused (filed up, or cleaned and not yet reused) pages. Right, that was my guess because heavily updated rows start to move around in the table, and because UPDATE tries to stay on the same page, once it the row hits a mostly-empty page, it stays there. The overall performance could be made a little better by tuning the system to not put more than N new rows on the same page at initial insert or when the row move to a new page during update. Currently several new rows are initially put on the same page and then move around during repeated updates until they slow(ish)ly claim their own page. We have a fillfactor patch that will be in 8.2. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] xlog viewer proposal
Alright, I'm working on a fast prototype using the SRF.On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2006-06-23 at 10:59 -0300, Diogo Biazus wrote: On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote: - give more flexibility for managing the xlogs remotely Not sure what you mean. - I think it's faster to implement and to have a working and usable tool. Why do you think that? It sounds like you've got more work since you effectively need to rewrite the _desc routines. Yes, but I don't need to worry with program output, and I have the backend's memory management and error handling.I'd suggest doing a quick prototype to allow us to evaluate whicharchitecture would be preferable. I'm torn between the good-idea and the safe-minimal-but-definitely in8.2 option.--Simon RiggsEnterpriseDB http://www.enterprisedb.com -- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br http://www.postgresql.org.br
[HACKERS] Table clustering idea
There is a well known command called CLUSTER which organizes tablein specified index's order. It has a drawback, that new tuples added arenot in this order. Last night I had idea which could be interesting, I hope. The idea is to make use of 'histogram_bounds' collected statistical data.Instead of inserting row into first suitable spot in a table, a table wouldbe divided into sections, one for each of histogram_bounds ranges. When inserting, the database would try to find most suitable sectionto insert (using the histogram_bounds), and if there were free spotsthere, would insert there. If not, it would either look for a tuple in nearby sections, or first suitable place.What would it do? It would try to keep table somewhat organized,keeping rows of similar values close together (within SET STATISTICSresolution, so a common scenario would be 50 or 100 sections). It would make it a bit hard for a table to shrink (since new rows wouldbe added throughout the table, not at the beginning).Other idea than using histogram_bounds would be using the positionof key inside the index to determine the ideal place of row inside the table and find the closest free spot there. This would be of coursemuch more precise and wouldn't rely on statistic. Regards, Dawid
Re: [HACKERS] Table clustering idea
Dawid, Other idea than using histogram_bounds would be using the position of key inside the index to determine the ideal place of row inside the table and find the closest free spot there. This would be of course much more precise and wouldn't rely on statistic. This is generally known as index organized tables in other databases. It implements a clustered storage of the table, which dramatically speeds access on the chosen indexed column and makes inserts fast. This also eases some of the visibility/MVCC implementation issues being discussed on hackers, but does not help with the maintenance of non-storage key indices. Other DBMS have index organized tables that can use either hash or btree organizations, both of which have their uses. We are planning to implement btree organized tables sometime - anyone else interested in this idea? - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Table clustering idea
Luke, Other DBMS have index organized tables that can use either hash or btree organizations, both of which have their uses. We are planning to implement btree organized tables sometime - anyone else interested in this idea? Search the archives. It's been dicussed on this list at least twice before, that I know of. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
bruce wrote: Why three? I explained using only two heap tuples: [item1]...[tuple1] becomes on UPDATE: -- [item1]...[tuple1][tuple2] - on another UPDATE, if tuple1 is no longer visible: -- [item1]...[tuple1][tuple2] -- Here is some pseudo-code that implements this: Definition: Single-Index-Tuple Chain (CITC) Do old and new UPDATE rows have unchanged indexed columns? Is old row member of CITC? Point item id at first CITC visible tuple in chain Mark previous invisible CITC tuples as freespace Does page have free space? Add new tuple on the same page as old Mark old tuple as CITC Do not create index entries VACUUM would have to be taught about CITC, and CREATE INDEX would have to create entries in other indexes for cases where its new indexed columns change inside a CITC. Conceptually, CITC allows a single index entry to point to multiple UPDATEd rows, allowing non-visible tuples to be recycled (and reused for future UPDATEs) without affecting the index. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] vacuum, performance, and MVCC
On 6/25/2006 5:18 PM, Bruce Momjian wrote: Jan Wieck wrote: An update that results in all the same values of every indexed column of a known deleted invisible tuple. This reused tuple can by definition not be the one currently updated. So unless it is a table without a primary key, this assumes that at least 3 versions of the same row exist within the same block. How likely is that to happen? Good question. You take the current tuple, and make another one on the same page. Later, an update can reuse the original tuple if it is no longer visible to anyone (by changing the item id), so you only need two tuples, not three. My hope is that a repeated update would eventually move to a page that enough free space for two (or more) versions. Does that help explain it? That's exactly what I meant. You need space for 3 or more tuple versions within one page and the luck that one of them is invisible at the time of the update. I don't know how likely or unlikely this is in reality, but it doesn't sound very promising to me so far. Why three? I explained using only two heap tuples: For some reason I counted in the new tuple ... sorry that. Yes, it can work with two tuples. [item1]...[tuple1] becomes on UPDATE: -- [item1]...[tuple1][tuple2] - on another UPDATE, if tuple1 is no longer visible: -- [item1]...[tuple1][tuple2] -- Another problem with this is that even if you find such row, it doesn't spare you the index traversal. The dead row whos item id you're reusing might have resulted from an insert that aborted or crashed before it finished creating all index entries. Or some of its index entries might already be flagged known dead, and you better reset those flags. You can only reuse heap rows that were created and expired by committed transactions. In fact, you can only UPDATE a row that was created by a committed transaction. You cannot _reuse_ any row, but only a row that is being UPDATEd. Also, it cannot be known dead because it are are in the process of updating it. Now you lost me. What do you mean a row that is being UPDATEd? The row (version) being UPDATEd right now cannot be expired, or why would you update that one? And if your transaction rolls back later, the row you update right now must be the one surviving. Any row that was created by a committed transaction does indeed have all the index entries created. But if it is deleted and expired, that means that the transaction that stamped xmax has committed and is outside of every existing snapshot. You can only reuse a slot that is used by a tuple that satisfies the vacuum snapshot. And a tuple that satisfies that snapshot has potentially index entries flagged known dead. I am thinking my idea was not fully understood. Hopefully this email helps. I must be missing something because I still don't see how it can work. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] vacuum, performance, and MVCC
Jan Wieck wrote: [item1]...[tuple1] becomes on UPDATE: -- [item1]...[tuple1][tuple2] - on another UPDATE, if tuple1 is no longer visible: -- [item1]...[tuple1][tuple2] -- Another problem with this is that even if you find such row, it doesn't spare you the index traversal. The dead row whos item id you're reusing might have resulted from an insert that aborted or crashed before it finished creating all index entries. Or some of its index entries might already be flagged known dead, and you better reset those flags. You can only reuse heap rows that were created and expired by committed transactions. In fact, you can only UPDATE a row that was created by a committed transaction. You cannot _reuse_ any row, but only a row that is being UPDATEd. Also, it cannot be known dead because it are are in the process of updating it. Now you lost me. What do you mean a row that is being UPDATEd? The row (version) being UPDATEd right now cannot be expired, or why would you update that one? And if your transaction rolls back later, the row you update right now must be the one surviving. It can only be a non-visible version of the row earlier in the UPDATE chain, not the actual one being updated. Any row that was created by a committed transaction does indeed have all the index entries created. But if it is deleted and expired, that means that the transaction that stamped xmax has committed and is outside of every existing snapshot. You can only reuse a slot that is used by a tuple that satisfies the vacuum snapshot. And a tuple that satisfies that snapshot has potentially index entries flagged known dead. When you are using the update chaining, you can't mark that index row as dead because it actually points to more than one row on the page, some are non-visible, some are visible. I am thinking my idea was not fully understood. Hopefully this email helps. I must be missing something because I still don't see how it can work. I just posted pseudo-code. Hope that helps. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] vacuum, performance, and MVCC
On 6/25/2006 10:12 PM, Bruce Momjian wrote: When you are using the update chaining, you can't mark that index row as dead because it actually points to more than one row on the page, some are non-visible, some are visible. Back up the truck ... you mean in the current code base we have heap tuples that are visible in index scans because of heap tuple chaining but without index tuples pointing directly at them? Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] vacuum, performance, and MVCC
Jan Wieck wrote: On 6/25/2006 10:12 PM, Bruce Momjian wrote: When you are using the update chaining, you can't mark that index row as dead because it actually points to more than one row on the page, some are non-visible, some are visible. Back up the truck ... you mean in the current code base we have heap tuples that are visible in index scans because of heap tuple chaining but without index tuples pointing directly at them? I don't know where this idea came from, but it's not true. All heap tuples, dead or otherwise, have index entries. Unless the idea is to extend update chaining to mean something different from the current meaning. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly