Re: [HACKERS] Including Snapshot Info with Indexes
Ühel kenal päeval, L, 2007-10-20 kell 10:19, kirjutas Luke Lonergan: Hi Hannu, On 10/14/07 12:58 AM, Hannu Krosing [EMAIL PROTECTED] wrote: What has happened in reality, is that the speed difference between CPU, RAM and disk speeds has _increased_ tremendously Yes. which makes it even more important to _decrease_ the size of stored data if you want good performance Or bring the cpu processing closer to the data it's using (or both). By default, the trend you mention first will continue in an unending way - the consequence is that the distance between a processor and it's target data will continue to increase ad-infinitum. the emergence of solid-state (flash) disks may help a little here, but in general it is true. By contrast, you can only decrease the data volume so much - so in the end you'll be left with the same problem - the data needs to be closer to the processing. This is the essence of parallel / shared nothing architecture. Note that we've done this at Greenplum. We're also implementing a DSM-like capability and are investigating a couple of different hybrid row / column store approaches. Have you tried moving the whole visibility part of tuples out to a separate heap ? Especially in OLAP/ETL scenarios the distribution of tuples loaded in one transaction should be very good for visibility-info compression. I'd suspect that you could crush hundreds of pages worth of visibility into single RLE encoding unit (xmin=N, xmax=no_yet, start_ctid = X, end_ctid=Y), and it will stay in L1 cache most of the time you process the corresponding relation. and the relation itself will be smaller, and index-only (actually index-only + lookup inside L1 cache) access can happen, and so on . OTOH, if you load it in millions of small transactions, you can run VACUUM FREEZE _on_ the visibility heap only, which will make all visibility infoe look similar and thus RLE-compressable and again make it fit in L1 cache, if you dont have lots of failed loads interleaved with successful ones. Bitmap index with index-only access does provide nearly all of the advantages of a column store from a speed standpoint BTW. Even though Vertica is touting speed advantages - our parallel engine plus bitmap index will crush them in benchmarks when they show up with real code. Meanwhile they're moving on to new ideas - I kid you not Horizontica is Dr. Stonebraker's new idea :-) Sounds like a result of a marketroid brainstorming session :P So - bottom line - some ideas from column store make sense, but it's not a cure-all. There is also a MonetDB/X100 project, which tries to make MonetOD order(s) of magnitude faster by doing in-page compression in order to get even more performance, see: Actually, the majority of the points made by the MonetDB team involve decreasing the abstractions in the processing path to improve the IPC (instructions per clock) efficiency of the executor. The X100 part was about doing in-page compression, so the efficiency of disk to L1 cache pathway would increase. so for 1/2 compression the CPU would get twice the data threoughput. We are also planning to do this by operating on data in vectors of projected rows in the executor, which will increase the IPC by reducing I-cache misses and improving D-cache locality. Tight loops will make a much bigger difference when long runs of data are the target operands. - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
Hi, I have tested with makeing this change and it is showing useful readings. The point of introducing the indexes with snapshot is that it should reduce the number of logical I/Os.(It may be from memory / from hard disk). Logical I/Os are potential Physical I/Os. On 10/20/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote: On Sat, Oct 20, 2007 at 09:24:07AM +0530, Gokulakannan Somasundaram wrote: Hi, I think i have a initial Implementation. It has some bugs and i am working on fixing it. But to show the advantages, I want to show the number of Logical I/Os on the screen. In order to show that, i tried enabling the log_statement option in PostgreSQL.conf. But it shows only the physical reads. What i wanted was a Logical reads count( No. of ReadBuffer calls, which is stored in ReadBufferCount variable). So i have added this stats to the bufmgr.c(function is BufferUsage, i suppose) to show Logical Reads and Physical Reads. Is this a acceptable change? I'm not sure if the number of logical reads is really a useful measurement. I can imagine there are places that deliberatly read the block logically a few times but drop the pin in between to allow others access. This will skew your results as in actual usage only the first is likely to generate a real I/O. If they have dropped the pin to allow other accesses, then the buffer may lose its place in memory. So it might become a physical I/O, of course at a lower probability. But still if we think of this from SQL tuner's perspective, he is going to change the query slightly, or add/remove indexes in order to verify whether he has improved the Query performance. Can we say that he has improved the performance 99% of the time, if the SQL fired has reduced the logical I/Os? If your problem is cache it seems to me you should test with a table larger than your shared buffers and perhaps even larger than your total memory, since this is the case we're actually interested in. In this case we may not know which rows of the table are in which block. Say we fire a query, which does index scan. it might have referred to some table block. We can't say for sure that if i change some value in the index scan, it won't touch the same table block. This solution is perfect, if we have to do a Load Test / Performance Test. But for SQL tuning, running a Load test is slightly costly. Even, if the statistic doesn't become useful in some cases, we can safely ignore it. I will submit my initial patch today. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On Sat, Oct 20, 2007 at 09:24:07AM +0530, Gokulakannan Somasundaram wrote: Hi, I think i have a initial Implementation. It has some bugs and i am working on fixing it. But to show the advantages, I want to show the number of Logical I/Os on the screen. In order to show that, i tried enabling the log_statement option in PostgreSQL.conf. But it shows only the physical reads. What i wanted was a Logical reads count( No. of ReadBuffer calls, which is stored in ReadBufferCount variable). So i have added this stats to the bufmgr.c(function is BufferUsage, i suppose) to show Logical Reads and Physical Reads. Is this a acceptable change? I'm not sure if the number of logical reads is really a useful measurement. I can imagine there are places that deliberatly read the block logically a few times but drop the pin in between to allow others access. This will skew your results as in actual usage only the first is likely to generate a real I/O. If your problem is cache it seems to me you should test with a table larger than your shared buffers and perhaps even larger than your total memory, since this is the case we're actually interested in. Have a ncie day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Including Snapshot Info with Indexes
Hi Hannu, On 10/14/07 12:58 AM, Hannu Krosing [EMAIL PROTECTED] wrote: What has happened in reality, is that the speed difference between CPU, RAM and disk speeds has _increased_ tremendously Yes. which makes it even more important to _decrease_ the size of stored data if you want good performance Or bring the cpu processing closer to the data it's using (or both). By default, the trend you mention first will continue in an unending way - the consequence is that the distance between a processor and it's target data will continue to increase ad-infinitum. By contrast, you can only decrease the data volume so much - so in the end you'll be left with the same problem - the data needs to be closer to the processing. This is the essence of parallel / shared nothing architecture. Note that we've done this at Greenplum. We're also implementing a DSM-like capability and are investigating a couple of different hybrid row / column store approaches. Bitmap index with index-only access does provide nearly all of the advantages of a column store from a speed standpoint BTW. Even though Vertica is touting speed advantages - our parallel engine plus bitmap index will crush them in benchmarks when they show up with real code. Meanwhile they're moving on to new ideas - I kid you not Horizontica is Dr. Stonebraker's new idea :-) So - bottom line - some ideas from column store make sense, but it's not a cure-all. There is also a MonetDB/X100 project, which tries to make MonetOD order(s) of magnitude faster by doing in-page compression in order to get even more performance, see: Actually, the majority of the points made by the MonetDB team involve decreasing the abstractions in the processing path to improve the IPC (instructions per clock) efficiency of the executor. We are also planning to do this by operating on data in vectors of projected rows in the executor, which will increase the IPC by reducing I-cache misses and improving D-cache locality. Tight loops will make a much bigger difference when long runs of data are the target operands. - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
Hi, I think i have a initial Implementation. It has some bugs and i am working on fixing it. But to show the advantages, I want to show the number of Logical I/Os on the screen. In order to show that, i tried enabling the log_statement option in PostgreSQL.conf. But it shows only the physical reads. What i wanted was a Logical reads count( No. of ReadBuffer calls, which is stored in ReadBufferCount variable). So i have added this stats to the bufmgr.c(function is BufferUsage, i suppose) to show Logical Reads and Physical Reads. Is this a acceptable change? I thought logical read count would be helpful, even for SQL tuning. Since if someone wants to tune the SQL on a test system, things might get cached and he wouldn't know how much I/O his SQL is potentially capable of. May be we can add a statistic to show how many of those ReadBuffers are pinned Buffers. Expecting your comments. Thanks, Gokul. On 10/14/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: On 10/14/07, Trevor Talbot [EMAIL PROTECTED] wrote: On 10/14/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: http://www.databasecolumn.com/2007/09/one-size-fits-all.html The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. What i meant there was, it has duplicated storage of certain columns of the table. A table with more than one projection always needs more space, than a table with just one projection. By doing this they are reducing the number of disk operations. If they are duplicating columns of data to avoid reading un-necessary information, we are duplicating the snapshot information to avoid going to the table. Was this about Vertica or MonetDB? I saw that article a while ago, and I didn't see anything that suggested Vertica duplicated data, just that it organized it differently on disk. What are you seeing as being duplicated? Hi Trevor, This is a good paper to read about the basics of Column-oriented databases. http://db.lcs.mit.edu/projects/cstore/vldb.pdf If you goto the Section 2 - Data Model. He has shown the data model, with a sample EMP table. The example shows that EMP table contains four columns - Name, Age, Dept, Salary From this table, projections are being formed - (In the paper, they have shown the creation of four projections for Example 1) EMP1 (name, age) EMP2 (dept, age, DEPT.floor) EMP3 (name, salary) DEPT1(dname, floor) As you can see, the same column information gets duplicated in different projections. The advantage is that if a query is around name and age, it need not skim around other details. But the storage requirements go high, since there is redundancy. As you may know, if you increase data redundancy, it will help selects at the cost of inserts, updates and deletes. This is what i was trying to say. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
Ühel kenal päeval, L, 2007-10-13 kell 17:44, kirjutas Gokulakannan Somasundaram: Hi, I went through this article and it was good. Please have a look at it. http://www.databasecolumn.com/2007/09/one-size-fits-all.html This article was written by Michael Stonebraker, considered to be the founder of our database. He has mentioned that the DBMS designed in 1970s haven't changed according to the change that has happened in Hardware landscape. What has happened in reality, is that the speed difference between CPU, RAM and disk speeds has _increased_ tremendously, which makes it even more important to _decrease_ the size of stored data if you want good performance The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. MonetDB is not about using more disk to get better performance, but about reducing the need to read unused data and increasing the speed by that. There is also a MonetDB/X100 project, which tries to make MonetOD order(s) of magnitude faster by doing in-page compression in order to get even more performance, see: http://homepages.cwi.nl/~boncz/x100.html http://www.cwi.nl/themes/ins1/publications/docs/ZuBoNeHe:DEBULL:05.pdf Even otherwise we are recommending Indexes with snapshot as an option. We are not replacing the current index scheme. So if someone feels that his database should run on lesser disk space, let them create the normal index. If he feels he can afford to have more redundant disk space, then he can create indexes with snapshots. We are reducing random I/Os at the cost of extra disk space. So definitely that's a good. But tech folks like us can better decide on something based on experiments, as Tom has pointed out. So let's see whether Indexes with snapshot is worth the trade-off in space. Agreed. Hannu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Including Snapshot Info with Indexes
A Dissabte 13 Octubre 2007, Gokulakannan Somasundaram va escriure: Even otherwise we are recommending Indexes with snapshot as an option. We are not replacing the current index scheme. So if someone feels that his database should run on lesser disk space, let them create the normal index. If he feels he can afford to have more redundant disk space, then he can create indexes with snapshots. We are reducing random I/Os at the cost of extra disk space. So definitely that's a good. But tech folks like us can better decide on something based on experiments, as Tom has pointed out. So let's see whether Indexes with snapshot is worth the trade-off in space. There's also LucidDB [1], another open souce column based data base. But if you look at the features section in their web page, you'll see they use page-level multi-versioning. So they are avoiding the need for storing snapshot information for each tuple, I think that has to be kept in mind. I'd really like that PostgreSQL could gain some features ala Column Based databases, so the administrator could choose how he wants to use the database, but I don't think we'll be able to compete with them if they store snapshot informatin per page, and we're storing it per tuple, for example. So any step in this directoy will probably mean understanding the decisions they've made in their architectures. [1] http://www.luciddb.org/ -- Albert Cervera i Areny http://www.NaN-tic.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/14/07, Hannu Krosing [EMAIL PROTECTED] wrote: Ühel kenal päeval, L, 2007-10-13 kell 17:44, kirjutas Gokulakannan Somasundaram: Hi, I went through this article and it was good. Please have a look at it. http://www.databasecolumn.com/2007/09/one-size-fits-all.html This article was written by Michael Stonebraker, considered to be the founder of our database. He has mentioned that the DBMS designed in 1970s haven't changed according to the change that has happened in Hardware landscape. What has happened in reality, is that the speed difference between CPU, RAM and disk speeds has _increased_ tremendously, which makes it even more important to _decrease_ the size of stored data if you want good performance The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. MonetDB is not about using more disk to get better performance, but about reducing the need to read unused data and increasing the speed by that. There is also a MonetDB/X100 project, which tries to make MonetOD order(s) of magnitude faster by doing in-page compression in order to get even more performance, see: http://homepages.cwi.nl/~boncz/x100.html http://www.cwi.nl/themes/ins1/publications/docs/ZuBoNeHe:DEBULL:05.pdf What i meant there was, it has duplicated storage of certain columns of the table. A table with more than one projection always needs more space, than a table with just one projection. By doing this they are reducing the number of disk operations. If they are duplicating columns of data to avoid reading un-necessary information, we are duplicating the snapshot information to avoid going to the table. Even otherwise we are recommending Indexes with snapshot as an option. We are not replacing the current index scheme. So if someone feels that his database should run on lesser disk space, let them create the normal index. If he feels he can afford to have more redundant disk space, then he can create indexes with snapshots. We are reducing random I/Os at the cost of extra disk space. So definitely that's a good. But tech folks like us can better decide on something based on experiments, as Tom has pointed out. So let's see whether Indexes with snapshot is worth the trade-off in space. Agreed. And more one more good news for people, who are following this thread. It seems like we won't be having a hit on update performance, if the indexes are not updated. BTStack remains the same for the old and new tuples, if the index tuple is not updated. But i don't know whether i would be able to put that tuning(re-using BTSTack) in the first patch So Indexes with snapshots will be degrading the performance only for deletes and only those updates, which are updating the index tuple. I think delete overhead can be ruled out for those who will be working with partitions, since they usually drop the partitions. Thanks, Gokul. Hannu
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: So Indexes with snapshots will be degrading the performance only for deletes and only those updates, which are updating the index tuple. Deletes never update indexes in Postgres. Increasing the size of the index would affect vacuum, inserts, and index accesses. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/14/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: http://www.databasecolumn.com/2007/09/one-size-fits-all.html The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. What i meant there was, it has duplicated storage of certain columns of the table. A table with more than one projection always needs more space, than a table with just one projection. By doing this they are reducing the number of disk operations. If they are duplicating columns of data to avoid reading un-necessary information, we are duplicating the snapshot information to avoid going to the table. Was this about Vertica or MonetDB? I saw that article a while ago, and I didn't see anything that suggested Vertica duplicated data, just that it organized it differently on disk. What are you seeing as being duplicated? (This is orthogonal to the current thread; I'm just curious.) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/14/07, Gregory Stark [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram [EMAIL PROTECTED] writes: So Indexes with snapshots will be degrading the performance only for deletes and only those updates, which are updating the index tuple. Deletes never update indexes in Postgres. Increasing the size of the index would affect vacuum, inserts, and index accesses. In the new proposal, deletes are going to update indexes. So its a trade-off between selects and deletes, since selects may not need to goto the table for checking visibility. You may go through this thread, to get more details. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/14/07, Trevor Talbot [EMAIL PROTECTED] wrote: On 10/14/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: http://www.databasecolumn.com/2007/09/one-size-fits-all.html The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. What i meant there was, it has duplicated storage of certain columns of the table. A table with more than one projection always needs more space, than a table with just one projection. By doing this they are reducing the number of disk operations. If they are duplicating columns of data to avoid reading un-necessary information, we are duplicating the snapshot information to avoid going to the table. Was this about Vertica or MonetDB? I saw that article a while ago, and I didn't see anything that suggested Vertica duplicated data, just that it organized it differently on disk. What are you seeing as being duplicated? Hi Trevor, This is a good paper to read about the basics of Column-oriented databases. http://db.lcs.mit.edu/projects/cstore/vldb.pdf If you goto the Section 2 - Data Model. He has shown the data model, with a sample EMP table. The example shows that EMP table contains four columns - Name, Age, Dept, Salary From this table, projections are being formed - (In the paper, they have shown the creation of four projections for Example 1) EMP1 (name, age) EMP2 (dept, age, DEPT.floor) EMP3 (name, salary) DEPT1(dname, floor) As you can see, the same column information gets duplicated in different projections. The advantage is that if a query is around name and age, it need not skim around other details. But the storage requirements go high, since there is redundancy. As you may know, if you increase data redundancy, it will help selects at the cost of inserts, updates and deletes. This is what i was trying to say. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/12/07, Tom Lane [EMAIL PROTECTED] wrote: Andreas Joseph Krogh [EMAIL PROTECTED] writes: Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. If you mean count(*) will become instantaneous, no it won't. It would get faster, but probably not by more than a factor of 10 or so, corresponding to the size ratio between index and table. Remember that the proposal is going to bloat indexes pretty badly: a simple index on an int4 column will double in size, more or less. So that ratio will be much less favorable than it is today. I accept that the indexes will be bigger in size for this approach. You might need more disk-space and you might need more memory to accomodate the same amount of information. But i think disk costs and memory costs have come down a lot, People can afford to buy more disk and memory. But the only fact that remains true is that the disk, the last mechanical device is slow in addressing Random I/Os. So this proposal is aimed at occupying more memory and disk space to reduce Random I/Os. Say, if we are accomodating 200 tuples per index page in today's index(for a typical configuration), and as you said in the worst case (only one index field), we will be occupying 100 tuples per index page. But we would take away probably 100 random I/Os (say with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of disk is around $1. But one hour of Performance tuner would cost around $200 :)). So that's the trade-off for the enterprises, if they want to shift between the two indexes. Personally I think the bloat problem will doom this entire approach. The distributed costs of that will outweigh the speedups that can be achieved for specific queries. The OP is free to try to prove this fear wrong, but no amount of discussion will constitute such a proof; only a testable implementation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: I accept that the indexes will be bigger in size for this approach. You might need more disk-space and you might need more memory to accomodate the same amount of information. But i think disk costs and memory costs have come down a lot, People can afford to buy more disk and memory. That's not how it works. We're not generally worried about people running out of disk or memory resources. But no matter how cheap they get people will only have what they have. We have to worry about running as fast as possible for a *given* amount of RAM or disk. Generally raising disk space usage results in a corresponding increase in run time. So an index that takes twice as much space on disk will consume twice as much time to consult as one that doesn't. You need to save enough time elsewhere to make that up and then some to make it worthwhile. I think we are pretty set on having the DSM for vacuuming purposes so you'll also have to argue this approach will cover enough additional cases or be better in some other way compared to using the DSM to be a win. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
Hi, I went through this article and it was good. Please have a look at it. http://www.databasecolumn.com/2007/09/one-size-fits-all.html This article was written by Michael Stonebraker, considered to be the founder of our database. He has mentioned that the DBMS designed in 1970s haven't changed according to the change that has happened in Hardware landscape. The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. Even otherwise we are recommending Indexes with snapshot as an option. We are not replacing the current index scheme. So if someone feels that his database should run on lesser disk space, let them create the normal index. If he feels he can afford to have more redundant disk space, then he can create indexes with snapshots. We are reducing random I/Os at the cost of extra disk space. So definitely that's a good. But tech folks like us can better decide on something based on experiments, as Tom has pointed out. So let's see whether Indexes with snapshot is worth the trade-off in space. Thanks, Gokul. On 10/13/07, Gregory Stark [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram [EMAIL PROTECTED] writes: I accept that the indexes will be bigger in size for this approach. You might need more disk-space and you might need more memory to accomodate the same amount of information. But i think disk costs and memory costs have come down a lot, People can afford to buy more disk and memory. That's not how it works. We're not generally worried about people running out of disk or memory resources. But no matter how cheap they get people will only have what they have. We have to worry about running as fast as possible for a *given* amount of RAM or disk. Generally raising disk space usage results in a corresponding increase in run time. So an index that takes twice as much space on disk will consume twice as much time to consult as one that doesn't. You need to save enough time elsewhere to make that up and then some to make it worthwhile. I think we are pretty set on having the DSM for vacuuming purposes so you'll also have to argue this approach will cover enough additional cases or be better in some other way compared to using the DSM to be a win. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/13/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: I accept that the indexes will be bigger in size for this approach. You might need more disk-space and you might need more memory to accomodate the same amount of information. But i think disk costs and memory costs have come down a lot, People can afford to buy more disk and memory. But the only fact that remains true is that the disk, the last mechanical device is slow in addressing Random I/Os. So this proposal is aimed at occupying more memory and disk space to reduce Random I/Os. Say, if we are accomodating 200 tuples per index page in today's index(for a typical configuration), and as you said in the worst case (only one index field), we will be occupying 100 tuples per index page. But we would take away probably 100 random I/Os (say with bitmap scan it reduces to 75). 1GB of memory is around $100 and 1GB of disk is around $1. But one hour of Performance tuner would cost around $200 :)). So that's the trade-off for the enterprises, if they want to shift between the two indexes. I disagree. Even with the latest on-disk size enhancements, Postgres still has a substantially larger on-disk footprint than pretty much every other database. Like it or not, additional I/O costs are not something that should just be dismissed. Disregarding fundamental database issues (like increased I/O) leads me to believe that you don't have much experience tuning databases. As I have a bit of experience adding visibility to the indexes, I stand behind DSM. From an analytical standpoint, and given Postgres' MVCC design, it would be hard to beat a properly designed DSM in this area. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/13/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: Hi, I went through this article and it was good. Please have a look at it. http://www.databasecolumn.com/2007/09/one-size-fits-all.html This article was written by Michael Stonebraker, considered to be the founder of our database. He has mentioned that the DBMS designed in 1970s haven't changed according to the change that has happened in Hardware landscape. The Vertica database(Monet is a open source version with the same principle) makes use of the very same principle. Use more disk space, since they are less costly and optimize the data warehousing. Sorry, but quoting Stonebraker (who has a *financial* interest in his statement), is pointless. Similarly, you can't directly compare his concepts to your case. IMHO, you're trying to get buy-in. In this group, unless you have a patch that proves something, you're generally out of luck. My suggestion is to start coding. You will find, as I did, that DSM is a better solution. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
Hi All, So i think we are clear now, that it is possible to have an index with snapshot info. Let me try to enumerate the uses of having the Index with snapshot info, in comparison to the Dead Space Map. a) Dead Space, if it is successfull in its implementation of what it claims, will have the means to point out that all the tuples of certain chunks are frozen for registered relations and registered chunks. There would be lot of blocks which won't fall under this category. i) For example, if the records are newly inserted, that block can't be marked as containing all frozen tuples. On 10/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As explained, if we are going to include the snapshot with indexes, Vacuum will be done on the index independent of the table, so Vacuum will not depend on immutability. We need to goto the index from the table, when we want to update the snapshot info. The problem on hand is that some of the userdefined functions are mutable, whereas the user might mark it immutable. So my idea is to have a mapping index, with tupleid as the first column and the function's values as subsequent columns. I have a somewhat detailed design in mind. So there will be a over head of extra 3 I/Os for update/delete on indices based on User-defined functions. But this setup will speed-up lot of queries where the tables are partitioned and there will be more inserts and selects and dropping partitions at periodic intervals. Updates become costly by 3 I/Os per Index with snapshot. So if someone has more selects than updates+deletes then this index might come handy (ofcourse not with user-defined functional indices). I think you need to explain why that is better than using the Dead Space Map. We're going to want the DSM anyway, to speed up VACUUMs; enabling index-only-scans just came as an afterthought. While DSM designed just for speeding up vacuums might look slightly different than one used for index-only scans, the infrastructure is roughly the same. What you're proposing sounds a lot more complex, less space-efficient, and slower to update. It requires extra action from the DBA, and it covers exactly the same use case (more selects than updates+deletes, to use your words). It would require changes to all index access methods, while the DSM would automatically work with all of them. In particular, including visibility information in a bitmap index, should we have bitmap indexes in the future, is impossible, while the DSM approach would just work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Including Snapshot Info with Indexes
Hi All, Last mail was sent by mistake without completion. I apologize for that. i am continuing on that. So i think we are clear now, that it is possible to have an index with snapshot info. Let me try to enumerate the uses of having the Index with snapshot info, in comparison to the Dead Space Map. a) Dead Space, if it is successfull in its implementation of what it claims, will have the means to point out that all the tuples of certain chunks are frozen for registered relations and registered chunks. There would be lot of blocks which won't fall under this category. i) For example, if the records are newly inserted, that block can't be marked as containing all frozen tuples. ii) Imagine the case where there is a batch job / Huge select query running in a enterprise for more than 6hrs. All the blocks which have got inserted into the tables, during this period might not be able to get the advantage of DSM On 10/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As explained, if we are going to include the snapshot with indexes, Vacuum will be done on the index independent of the table, so Vacuum will not depend on immutability. We need to goto the index from the table, when we want to update the snapshot info. The problem on hand is that some of the userdefined functions are mutable, whereas the user might mark it immutable. So my idea is to have a mapping index, with tupleid as the first column and the function's values as subsequent columns. I have a somewhat detailed design in mind. So there will be a over head of extra 3 I/Os for update/delete on indices based on User-defined functions. But this setup will speed-up lot of queries where the tables are partitioned and there will be more inserts and selects and dropping partitions at periodic intervals. Updates become costly by 3 I/Os per Index with snapshot. So if someone has more selects than updates+deletes then this index might come handy (ofcourse not with user-defined functional indices). I think you need to explain why that is better than using the Dead Space Map. We're going to want the DSM anyway, to speed up VACUUMs; enabling index-only-scans just came as an afterthought. While DSM designed just for speeding up vacuums might look slightly different than one used for index-only scans, the infrastructure is roughly the same. What you're proposing sounds a lot more complex, less space-efficient, and slower to update. It requires extra action from the DBA, and it covers exactly the same use case (more selects than updates+deletes, to use your words). It would require changes to all index access methods, while the DSM would automatically work with all of them. In particular, including visibility information in a bitmap index, should we have bitmap indexes in the future, is impossible, while the DSM approach would just work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Including Snapshot Info with Indexes
Hi All, Last two mails were sent by mistake without completion. I couldn't curse my system any further I apologize for that. If we comeback to the topic of discussion So i think we are clear now, that it is possible to have an index with snapshot info. Let me try to enumerate the uses of having the Index with snapshot info, in comparison to the Dead Space Map. a) Dead Space, if it is successfull in its implementation of what it claims, will have the means to point out that all the tuples of certain chunks are frozen for registered relations and registered chunks. There would be lot of blocks which won't fall under this category. i) For example, if the records are newly inserted, that block can't be marked as containing all frozen tuples. ii) Imagine the case where there is a batch job / Huge select query running in a enterprise for more than 6hrs. All the blocks which have got inserted into the tables, during this period might not be able to get the advantage of DSM iii) Imagine the case for which i am actually proposing the Index with Snapshot infos. Partitioned tables. Every time a new table gets created, it has to get registered into the Deadspace. This requires more maintenance on the DBA Side iv) I understand the DeadSpaceLock to be a Global lock(If one transaction is updating the dead space for any unregistered chunk, no one else can query the DeadSpace). If my statement is right, then partitioned tables might not be able to benefit from DSM. We have to remember for tables with daily partitions, this would prove to be a nightmare Other than that there are lot of advantages, i foresee with including the indexes with snapshots i) Vacumming of these indexes need not be done with SuperExclusive Locks. It is possible to design a strategy to vacuum these indexes with Exclusive locks on pages ii) The above would mean that index can be in operation while the vacuum is happening iii) As we have already stated, it provides a efficient clustering of related data. iv) The Reverse Mapping Index, if present provides an efficient solution to the Retail Vacuum problem. So HOT can be improved further with no need to place the restriction of the updated tuple should be in the same page iv) Updates on tables with primary keys(where primary key is not updated), will be able to resolve the unique constraint faster. This is a minor advantage. The complexity of Reverse Mapping index will only be there for user-defined functional indexes. These functions can be pruned further to find out the obvious immutable ones. I expect your valuable feedback for this. Thanks, Gokul. On 10/12/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: Hi All, Last mail was sent by mistake without completion. I apologize for that. i am continuing on that. So i think we are clear now, that it is possible to have an index with snapshot info. Let me try to enumerate the uses of having the Index with snapshot info, in comparison to the Dead Space Map. a) Dead Space, if it is successfull in its implementation of what it claims, will have the means to point out that all the tuples of certain chunks are frozen for registered relations and registered chunks. There would be lot of blocks which won't fall under this category. i) For example, if the records are newly inserted, that block can't be marked as containing all frozen tuples. ii) Imagine the case where there is a batch job / Huge select query running in a enterprise for more than 6hrs. All the blocks which have got inserted into the tables, during this period might not be able to get the advantage of DSM On 10/11/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As explained, if we are going to include the snapshot with indexes, Vacuum will be done on the index independent of the table, so Vacuum will not depend on immutability. We need to goto the index from the table, when we want to update the snapshot info. The problem on hand is that some of the userdefined functions are mutable, whereas the user might mark it immutable. So my idea is to have a mapping index, with tupleid as the first column and the function's values as subsequent columns. I have a somewhat detailed design in mind. So there will be a over head of extra 3 I/Os for update/delete on indices based on User-defined functions. But this setup will speed-up lot of queries where the tables are partitioned and there will be more inserts and selects and dropping partitions at periodic intervals. Updates become costly by 3 I/Os per Index with snapshot. So if someone has more selects than updates+deletes then this index might come handy (ofcourse not with user-defined functional indices). I think you need to explain why that is better than using the Dead Space Map. We're going to
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: Last two mails were sent by mistake without completion. I couldn't curse my system any further :-) a) Dead Space, if it is successfull in its implementation of what it claims, will have the means to point out that all the tuples of certain chunks are frozen for registered relations and registered chunks. There would be lot of blocks which won't fall under this category. i) For example, if the records are newly inserted, that block can't be marked as containing all frozen tuples. If records have just been inserted to a block, it is in cache. Therefore hitting that block to check visibility isn't going to cost much. There might be some middle-ground where a tuple has been inserted a while ago, so that the block has already been evicted from cache, but the transaction hasn't yet been committed, but that's a pretty narrow use case. Note that we can flag a page in the DSM not only by VACUUM, but by any backend as soon as all tuples are visible to everyone. You do have to scan the tuple headers on the page to determine that, but that's not so much overhead, and it could be offloaded to the bgwriter. ii) Imagine the case where there is a batch job / Huge select query running in a enterprise for more than 6hrs. All the blocks which have got inserted into the tables, during this period might not be able to get the advantage of DSM Yep, true. A long-running transaction like that is problematic anyway, because we can't vacuum away any dead rows generated during that period. iii) Imagine the case for which i am actually proposing the Index with Snapshot infos. Partitioned tables. Every time a new table gets created, it has to get registered into the Deadspace. This requires more maintenance on the DBA Side Why do you think that the DBA needs to register tables to the DSM manually? Surely that would happen automatically. iv) I understand the DeadSpaceLock to be a Global lock(If one transaction is updating the dead space for any unregistered chunk, no one else can query the DeadSpace). If my statement is right, then partitioned tables might not be able to benefit from DSM. We have to remember for tables with daily partitions, this would prove to be a nightmare The patch submitted for 8.3 did use a global lock, and a fixed size shared memory area, but those were exactly the reasons the patch was rejected. It will have to be reworked for 8.4. Other than that there are lot of advantages, i foresee with including the indexes with snapshots i) Vacumming of these indexes need not be done with SuperExclusive Locks. It is possible to design a strategy to vacuum these indexes with Exclusive locks on pages I'm not convinced that's true. We only need super-exclusive locks on index pages for interlocking index and heap accesses with non-MVCC snapshots, IOW system tables. And since the lock is only held for a short time and infrequently, it hasn't been a problem at all. ii) The above would mean that index can be in operation while the vacuum is happening Huh? VACUUM hasn't locked out other access since version 7.2! iii) As we have already stated, it provides a efficient clustering of related data. Sorry, I missed that part. What's that? iv) The Reverse Mapping Index, if present provides an efficient solution to the Retail Vacuum problem. So HOT can be improved further with no need to place the restriction of the updated tuple should be in the same page iv) Updates on tables with primary keys(where primary key is not updated), will be able to resolve the unique constraint faster. This is a minor advantage. The complexity of Reverse Mapping index will only be there for user-defined functional indexes. The *run-time* complexity of that will only be there for UDF indexes, but the *code* complexity will always be there. Sorry, I think the probability of a reverse mapping index being accepted to Postgres is very close to zero. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Including Snapshot Info with Indexes
Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
Andreas Joseph Krogh [EMAIL PROTECTED] writes: Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. If you mean count(*) will become instantaneous, no it won't. It would get faster, but probably not by more than a factor of 10 or so, corresponding to the size ratio between index and table. Remember that the proposal is going to bloat indexes pretty badly: a simple index on an int4 column will double in size, more or less. So that ratio will be much less favorable than it is today. Personally I think the bloat problem will doom this entire approach. The distributed costs of that will outweigh the speedups that can be achieved for specific queries. The OP is free to try to prove this fear wrong, but no amount of discussion will constitute such a proof; only a testable implementation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/12/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: If records have just been inserted to a block, it is in cache. Therefore hitting that block to check visibility isn't going to cost much. There might be some middle-ground where a tuple has been inserted a while ago, so that the block has already been evicted from cache, but the transaction hasn't yet been committed, but that's a pretty narrow use case. Note that we can flag a page in the DSM not only by VACUUM, but by any backend as soon as all tuples are visible to everyone. You do have to scan the tuple headers on the page to determine that, but that's not so much overhead, and it could be offloaded to the bgwriter. The first step in any database tuning of course is to reduce Random I/Os. But then the next step is to reduce the logical I/Os, Whether the I/O happens from the Disk/Memory, we should try to reduce the access to a shared resource as much as possible. So even if the newly inserted tuples are in memory / disk, restricting the access to it would improve the overall performance of the system(That place can be taken over by other blocks). If we see the overall picture, scalability of the database gets increased, as we reduce the number of locks being taken. Yep, true. A long-running transaction like that is problematic anyway, because we can't vacuum away any dead rows generated during that period. It is not problematic for the Indexes with snapshot. They will be working as usual. And i think one reason why timestamp based databases got an advantage over locking based databases is that batch jobs can run together with OLTP transactions. In order to maintain that advantage in PostgreSQL, Indexes with snapshot helps. Why do you think that the DBA needs to register tables to the DSM manually? Surely that would happen automatically. Accepted. The patch submitted for 8.3 did use a global lock, and a fixed size shared memory area, but those were exactly the reasons the patch was rejected. It will have to be reworked for 8.4. Ok, then the best case rework to my understanding would be to include the bitmap DSM block number into the locking attributes. But still one DSM block would be mapping to 8192 * 8 = 65536 Table blocks. So if you have to add a unregistered chunk of a newly created partition, then any access into the 65536 blocks will have to get stalled, which may not be acceptable under the OLTP performance requirements. This becomes a performance overhead for people maintaining daily partitions, as they create a new table everyday and the space would be increasing from morning till evening and the same table would be queried the most. I'm not convinced that's true. We only need super-exclusive locks on index pages for interlocking index and heap accesses with non-MVCC snapshots, IOW system tables. And since the lock is only held for a short time and infrequently, it hasn't been a problem at all. For a heap with no indexes, we don't take super-exclusive lock to do Vacuum on it. We just need to take Exclusive lock on each block and do the Vacuum on it. That's because the table contains the necessary visibility information. But with indexes, we may need to refer to the table in order to do Vacuum. In the mean-while we don't want any page splits to happen. That's why we take a super exclusive lock on all the leaf pages (no body should even have a pin on one of them Ref : README file in nbtree directory) But if we have the snapshot info into the indexes, then we can just do a index scan(similar to the heap scan described before) by taking Exclusive lock on pages one by one and Vacuuming them. ii) The above would mean that index can be in operation while the vacuum is happening Huh? VACUUM hasn't locked out other access since version 7.2! I might have missed something here. If we need Super-Exclusive lock on all leaf pages in the index to do the Vacuum(no-one should be even having a PIN on it), then how are we able to allow index scans while the Vacuum is happening? In my case, the index will have the snapshot information. so no super exclusive locks, so other leaf pages can be accessed. If there is a explanation, it might also be useful to update the README file in the nbtree directory iii) As we have already stated, it provides a efficient clustering of related data. Sorry, I missed that part. What's that? Say i create a index on SalespersonId, Date, Customer Name, Details of the Transaction on a table. For a query like select Customer Name, Details of the transaction from table where salespersonid='xyz' order by date. The entire query gets satisfied by the Index. We will not goto the table. In short the index acts like an IOT in oracle/ Clustered indexes in other databases. The necessary information is obtained from one place, since snapshot is stored in the index It will be very useful, especially when the query is going to return more records. The
Re: [HACKERS] Including Snapshot Info with Indexes
Andreas Joseph Krogh wrote: Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. Yes, both the DSM approach and the approach proposed by Gokul. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Including Snapshot Info with Indexes
On Friday 12 October 2007 11:49:17 Heikki Linnakangas wrote: Andreas Joseph Krogh wrote: Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. Yes, both the DSM approach and the approach proposed by Gokul. Good. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
I agree with that. I will go ahead and do a test implementation and share the results with everyone. Thanks, Gokul. On 10/12/07, Tom Lane [EMAIL PROTECTED] wrote: Andreas Joseph Krogh [EMAIL PROTECTED] writes: Will $SUBJECT make it possible for count(*) to use index? This is a much wanted feature. If you mean count(*) will become instantaneous, no it won't. It would get faster, but probably not by more than a factor of 10 or so, corresponding to the size ratio between index and table. Remember that the proposal is going to bloat indexes pretty badly: a simple index on an int4 column will double in size, more or less. So that ratio will be much less favorable than it is today. Personally I think the bloat problem will doom this entire approach. The distributed costs of that will outweigh the speedups that can be achieved for specific queries. The OP is free to try to prove this fear wrong, but no amount of discussion will constitute such a proof; only a testable implementation. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: As explained, if we are going to include the snapshot with indexes, Vacuum will be done on the index independent of the table, so Vacuum will not depend on immutability. We need to goto the index from the table, when we want to update the snapshot info. The problem on hand is that some of the userdefined functions are mutable, whereas the user might mark it immutable. So my idea is to have a mapping index, with tupleid as the first column and the function's values as subsequent columns. I have a somewhat detailed design in mind. So there will be a over head of extra 3 I/Os for update/delete on indices based on User-defined functions. But this setup will speed-up lot of queries where the tables are partitioned and there will be more inserts and selects and dropping partitions at periodic intervals. Updates become costly by 3 I/Os per Index with snapshot. So if someone has more selects than updates+deletes then this index might come handy (ofcourse not with user-defined functional indices). I think you need to explain why that is better than using the Dead Space Map. We're going to want the DSM anyway, to speed up VACUUMs; enabling index-only-scans just came as an afterthought. While DSM designed just for speeding up vacuums might look slightly different than one used for index-only scans, the infrastructure is roughly the same. What you're proposing sounds a lot more complex, less space-efficient, and slower to update. It requires extra action from the DBA, and it covers exactly the same use case (more selects than updates+deletes, to use your words). It would require changes to all index access methods, while the DSM would automatically work with all of them. In particular, including visibility information in a bitmap index, should we have bitmap indexes in the future, is impossible, while the DSM approach would just work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/9/07, Florian G. Pflug [EMAIL PROTECTED] wrote: Andrew Dunstan wrote: Florian G. Pflug wrote: I think you're overly pessimistic here ;-) This classification can be done quite efficiently as long as your language is static enough. The trick is not to execute the function, but to scan the code to find all other functions and SQL statements a given function may possibly call. If your function calls no SQL statements, and only other functions already marked IMMUTABLE, then it must be IMMUTABLE itself. It does seem that only pl/pgsql is static enough for this to work, though, making this idea rather unappealing. How would you propose to analyse C functions, for which you might not have the C code? Scanning the binary, together with symbol annotations for immutability of course ;-)) No, seriously. I do *not* advocate that we actually autoclassify functions, for a lot of reasons. I just wanted to refute the statement that doing so is generally impossible - it's not. It's trivial for some languages (In haskhell for example all functions that don't use monads are immutable, and their signature tell if they do use monads or or), realistic for others (pl/pgsql, where we do have the sourcecode), and utterly impossible for others (pl/{ruby,python,perl,...}, pl/c, ...). Besides - AFAICS *anything* that makes VACUUM depend on IMMUTABLE to be correct would instantly break tsearch, no? At least as long as we allow changing stopwords and the like of dictionaries used by an index - which we'd better allow, unless we want the DBAs to come with pitchforks after us... regards, Florian Pflug, who shudders when imagining DBAs with pitchforks... As explained, if we are going to include the snapshot with indexes, Vacuum will be done on the index independent of the table, so Vacuum will not depend on immutability. We need to goto the index from the table, when we want to update the snapshot info. The problem on hand is that some of the userdefined functions are mutable, whereas the user might mark it immutable. So my idea is to have a mapping index, with tupleid as the first column and the function's values as subsequent columns. I have a somewhat detailed design in mind. So there will be a over head of extra 3 I/Os for update/delete on indices based on User-defined functions. But this setup will speed-up lot of queries where the tables are partitioned and there will be more inserts and selects and dropping partitions at periodic intervals. Updates become costly by 3 I/Os per Index with snapshot. So if someone has more selects than updates+deletes then this index might come handy (ofcourse not with user-defined functional indices). I hope in future there can be more ways to find the immutability of the user-defined functional indices and the requirement for MApping index would go down. Expecting your comments. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. Well, it's not quite the same as a bitmap index, though both use a bitmap. You didn't quite get into details on what the limitations are and why it wouldn't be suitable for OLTP, but I don't see any significant problems. But i think i am late in the game as i haven't participated in those discussions Better late than never :). One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. Yeah, the DSM page could become a contention bottleneck. My current thinking is that we'd have a flag in the heap page header, that would be set together with the bit in the DSM. When the flag in the page header is set, you don't need to lock and update the DSM because you know the bit is already set. Vacuum would have to clear both the DSM bit and the flag. It matters to us, where the index scan will goto. If the Index Scan is going to touch DSM for understanding visibility(This might degrade the performance of some of the index scans, if they have to wait to acquire the share lock, and learn that they have to goto the heap to understand their visibility requirements.) In the mean while, if the vacuum, inserts/updates/deletes are holding the BUFFER_EXCLUSIVE lock on that, this would hurt the Select transactions. Since there is only one bit per block in the DSM(best case), there might be one DSM block per 8000 table blocks. All the transactions which are accessing the 8000 blocks will be waiting on this one DSM block. If we are going to update the Heap page header and asking the Indexscan to refer to that, then there is no reduction in random I/Os. Can't we say that if the snapshot info is embedded with index, we can avoid all these difficulties? Most importantly it won't affect the performance of current postgres in any way. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? We already do. A function must be marked as IMMUTABLE in order to use it in an index expression. But we can't enforce that the user defined function really behaves like an immutable function should. If someone creates a user-defined function in C that calls the C random() function, we can't stop it. A function is said to be deterministic, if it returns the same value, irrespective of how many times, it is invoked. I think this definition clearly puts the random function under the non-deterministic category. If we have such a classification, do you think we can resolve this issue? As I said earlier, using an index like that will of course lead to bogus results. But it won't currently cause any server crashes or more serious corruption. One more final word on unique indexes. Whenever we are doing an update, there will be insertions into the unique indexes which will trigger table lookups. Ofcourse there is more probability, that the table block would be in memory(un-pinned). Still contention for a shared resource is avoided, if the snapshot info is stored with the indexes. Let me get one more clarification, what would be type of performance results with this implementation, that would encourage the hackers community to accept the extra maintenance overhead. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/8/07, Florian G. Pflug [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: Hi Heikki, I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. But i think i am late in the game as i haven't participated in those discussions While the DSM might be similar in spirit to a bitmap index, the actual implementation has a lot more freedome I'd say, since you can tailor it exactly to the need of tracking some summarized visibility info. So not all shortcomings of bitmap indices must necessarily apply to the DSM also. But of course thats mostly handwavering... One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. A simple DSM would probably contain a bit per page that says all xmin GlobalXmin, and all xmax unset or aborted. That bit would only get SET during VACUUM, and only unset during INSERT/UPDATE/DELETE. If setting it is protected by a VACUUM-grade lock on the page, we might get away with no locking during the unset, making the locking overhead pretty small. Let me try to understand. Do you mean to say some kind of Test and Set implementation for Insert/Update/Delete? So that would mean that there won't be any lock during the change of bit flags. Why do we need lock to set it then? It looks like a great idea. I couldn't get that piece of discussion in the archive, which discusses the design of Retail Vacuum. So please advise me again here. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? Postgres already distinguishes VOLATILE,STABLE and IMMUTABLE functions. It doesn't, however, risk physical data corruption, even if you get that classification wrong. The worst that happens AFAIK are wrong query results - but fixing your function, followed by a REINDEX always corrects the problme. If you start poking holes into that safety net, there'll be a lot of pushback I believe - and IMHO rightly so, because people do, and always will, get such classifications wrong. A deterministic function is classified as one, which returns the same results, irrespective of how many times, it is invoked. So if we form a classification like that, do you think we will resolve the issue of Retail Vaccum? In the case of User-Defined functions, the user should be defining it as Deterministic. Can we frame a set of guidelines, or may be some test procedure, which can declare a certain function as deterministic? I am just saying from the top of my mind. Even otherwise, if we can even restrict this indexing to only Built-in deterministic functions., don't you think it would help the cause of a majority? I have just made the proposal to create the index with snapshot a optional one. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/9/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. Well, it's not quite the same as a bitmap index, though both use a bitmap. You didn't quite get into details on what the limitations are and why it wouldn't be suitable for OLTP, but I don't see any significant problems. But i think i am late in the game as i haven't participated in those discussions Better late than never :). One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. Yeah, the DSM page could become a contention bottleneck. My current thinking is that we'd have a flag in the heap page header, that would be set together with the bit in the DSM. When the flag in the page header is set, you don't need to lock and update the DSM because you know the bit is already set. Vacuum would have to clear both the DSM bit and the flag. It matters to us, where the index scan will goto. If the Index Scan is going to touch DSM for understanding visibility(This might degrade the performance of some of the index scans, if they have to wait to acquire the share lock, and learn that they have to goto the heap to understand their visibility requirements.) In the mean while, if the vacuum, inserts/updates/deletes are holding the BUFFER_EXCLUSIVE lock on that, this would hurt the Select transactions. Since there is only one bit per block in the DSM(best case), there might be one DSM block per 8000 table blocks. All the transactions which are accessing the 8000 blocks will be waiting on this one DSM block. If we are going to update the Heap page header and asking the Indexscan to refer to that, then there is no reduction in random I/Os. Can't we say that if the snapshot info is embedded with index, we can avoid all these difficulties? Most importantly it won't affect the performance of current postgres in any way. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? We already do. A function must be marked as IMMUTABLE in order to use it in an index expression. But we can't enforce that the user defined function really behaves like an immutable function should. If someone creates a user-defined function in C that calls the C random() function, we can't stop it. A function is said to be deterministic, if it returns the same value, irrespective of how many times, it is invoked. I think this definition clearly puts the random function under the non-deterministic category. If we have such a classification, do you think we can resolve this issue? If we frame a set of guidelines/test procedure, do you think it might solve the issue? Even, if we don't allow this type of indexing to anything other than built-in deterministic functions, i feel it would serve most of the indexing requirements. As I said earlier, using an index like that will of course lead to bogus results. But it won't currently cause any server crashes or more serious corruption. One more final word on unique indexes. Whenever we are doing an update, there will be insertions into the unique indexes which will trigger table lookups. Ofcourse there is more probability, that the table block would be in memory(un-pinned). Still contention for a shared resource is avoided, if the snapshot info is stored with the indexes. Let me get one more clarification, what would be type of performance results with this implementation, that would encourage the hackers community to accept the extra maintenance overhead. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram [EMAIL PROTECTED] writes: On 10/9/07, Gokulakannan Somasundaram [EMAIL PROTECTED] wrote: A function is said to be deterministic, if it returns the same value, irrespective of how many times, it is invoked. I think this definition clearly puts the random function under the non-deterministic category. If we have such a classification, do you think we can resolve this issue? If we frame a set of guidelines/test procedure, do you think it might solve the issue? Even, if we don't allow this type of indexing to anything other than built-in deterministic functions, i feel it would serve most of the indexing requirements. We already do this. c.f. IMMUTABLE at http://www.postgresql.org/docs/8.2/interactive/xfunc-volatility.html and http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
[snip] In the case of User-Defined functions, the user should be defining it as Deterministic. The user CAN already define his functions as Deterministic=IMMUTABLE... the problem is that many of us will define functions as immutable, when in fact they are not. And do that by mistake... and there's nothing postgres can do about that. Can we frame a set of guidelines, or may be some test procedure, which can declare a certain function as deterministic? You mean postgres should check your function if it is really immutable ? I can't imagine any way to do it correctly in reasonable time :-) Imagine a function of 10 parameters which returns the sum of the parameters all the time except for parameters all 1 it will randomly return a value _once in a thousand executions_... please find a generic algorithm which spots this function as not immutable in reasonable execution time ;-) So this example is a bit extreme, but don't underestimate the user ;-) I am just saying from the top of my mind. Even otherwise, if we can even restrict this indexing to only Built-in deterministic functions., don't you think it would help the cause of a majority? I have just made the proposal to create the index with snapshot a optional one. Restrictions like this are always confusing for the end user (i.e. why can I use built-ins here and not my own ?). I leave to the actual coders to say anything about code maintenance concerns... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
Csaba Nagy wrote: Can we frame a set of guidelines, or may be some test procedure, which can declare a certain function as deterministic? You mean postgres should check your function if it is really immutable ? I can't imagine any way to do it correctly in reasonable time :-) Imagine a function of 10 parameters which returns the sum of the parameters all the time except for parameters all 1 it will randomly return a value _once in a thousand executions_... please find a generic algorithm which spots this function as not immutable in reasonable execution time ;-) So this example is a bit extreme, but don't underestimate the user ;-) I think you're overly pessimistic here ;-) This classification can be done quite efficiently as long as your language is static enough. The trick is not to execute the function, but to scan the code to find all other functions and SQL statements a given function may possibly call. If your function calls no SQL statements, and only other functions already marked IMMUTABLE, then it must be IMMUTABLE itself. It does seem that only pl/pgsql is static enough for this to work, though, making this idea rather unappealing. I am just saying from the top of my mind. Even otherwise, if we can even restrict this indexing to only Built-in deterministic functions., don't you think it would help the cause of a majority? I have just made the proposal to create the index with snapshot a optional one. Restrictions like this are always confusing for the end user (i.e. why can I use built-ins here and not my own ?). I leave to the actual coders to say anything about code maintenance concerns... Yes, and some built-ins have gotten that classification wrong too in the past IIRC. Which probably is a good reason not to trust our users to get it right ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Including Snapshot Info with Indexes
I think you're overly pessimistic here ;-) This classification can be done quite efficiently as long as your language is static enough. The trick is not to execute the function, but to scan the code to find all other functions and SQL statements a given function may possibly call. If your function calls no SQL statements, and only other functions already marked IMMUTABLE, then it must be IMMUTABLE itself. OK, I have a black-box mindset right now due to the problem I'm currently working on, so I didn't even think about checking the source code of the function (which is the right thing to do if you have the source code)... in which case you're right, I was overly pessimistic :-) Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
Csaba Nagy wrote: You mean postgres should check your function if it is really immutable ? I can't imagine any way to do it correctly in reasonable time :-) I would say that in the general case it's analogous to the halting problem, not solvable at all let alone in any reasonable time. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Including Snapshot Info with Indexes
Florian G. Pflug wrote: I think you're overly pessimistic here ;-) This classification can be done quite efficiently as long as your language is static enough. The trick is not to execute the function, but to scan the code to find all other functions and SQL statements a given function may possibly call. If your function calls no SQL statements, and only other functions already marked IMMUTABLE, then it must be IMMUTABLE itself. It does seem that only pl/pgsql is static enough for this to work, though, making this idea rather unappealing. How would you propose to analyse C functions, for which you might not have the C code? cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
On Tue, 2007-10-09 at 11:22 -0400, Andrew Dunstan wrote: Csaba Nagy wrote: You mean postgres should check your function if it is really immutable ? I can't imagine any way to do it correctly in reasonable time :-) I would say that in the general case it's analogous to the halting problem, not solvable at all let alone in any reasonable time. In the light of Florian's mail, I would say that in the context of a language which can check each of it's constructs if it is immutable or not, a procedure using only immutable constructs should be itself immutable... the halting problem is avoided in that you don't really need to know if/how the procedure works, you only need to know that it will always work the same ;-) The problem is that in the general case the languages don't have available checks for this kind of thing, so either you restrict the immutability check to simple languages (static enough as Florian would say) or you must allow the user to decide if the function is immutable or not. In the general case I assume the users will want the power to decide (and potentially be wrong), and will expect that if they do mistake, the result won't be catastrophic. I guess this is the same conclusion as in previous threads about the subject... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
Andrew Dunstan wrote: Florian G. Pflug wrote: I think you're overly pessimistic here ;-) This classification can be done quite efficiently as long as your language is static enough. The trick is not to execute the function, but to scan the code to find all other functions and SQL statements a given function may possibly call. If your function calls no SQL statements, and only other functions already marked IMMUTABLE, then it must be IMMUTABLE itself. It does seem that only pl/pgsql is static enough for this to work, though, making this idea rather unappealing. How would you propose to analyse C functions, for which you might not have the C code? Scanning the binary, together with symbol annotations for immutability of course ;-)) No, seriously. I do *not* advocate that we actually autoclassify functions, for a lot of reasons. I just wanted to refute the statement that doing so is generally impossible - it's not. It's trivial for some languages (In haskhell for example all functions that don't use monads are immutable, and their signature tell if they do use monads or or), realistic for others (pl/pgsql, where we do have the sourcecode), and utterly impossible for others (pl/{ruby,python,perl,...}, pl/c, ...). Besides - AFAICS *anything* that makes VACUUM depend on IMMUTABLE to be correct would instantly break tsearch, no? At least as long as we allow changing stopwords and the like of dictionaries used by an index - which we'd better allow, unless we want the DBAs to come with pitchforks after us... regards, Florian Pflug, who shudders when imagining DBAs with pitchforks... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: Currently The index implementation in Postgresql does not store the Snapshot information in the Index. If we add the snapshot information into the indexing structure, we will have the following advantages. This idea has been discussed to death many times before. Please search the archives. a) There can be index only scans like Oracle IMO, the most promising approach to achieving index-only-scans at the moment is the Dead Space Map, as discussed in the 8.3 dev cycle. b) Unique indexes will become less costly, as older index tuples can be found out. Doesn't seem like a big benefit, considering that in most cases there won't be any tuples in the index with a duplicate key. A common exception to that is (non-HOT) updating a row. But in that case, the page containing the old tuple is already in cache, so the lookup of the visibility from the heap is cheap. c) Even the index scans will get faster, since some of the index tuples won't translate into HeapScans. That's the same as doing an index-only-scan, right? d) Deletes and Updates will become slightly costly, as they have to update these indexes. I think you're grossly underestimating the cost of that. For example, on a table with 3 indexes. a delete currently requires one index lookup + one heap lookup. With visibility in the indexes, that would require 3 index lookups + one heap lookup. That's 4 vs. 2 page accesses, not taking into account the non-leaf b-tree pages. The real impact will depend on what's in cache, but the cost can be very high. Also, the full visibility information would need 12 bytes of space per tuple. An index tuple on an int4 key currently takes 12 bytes, so that would double the index size. Storage size has a big impact on performance. More bytes means more I/O, less data fits in cache, and more WAL traffic. There's non-trivial implementation issues involved as well. You'd need a way to reliably find all the index pointers for a given heap tuple (search the archives for retail vacuum for the issues involved in that. Broken user defined functions are a problem for example). And you'd need to keep them all locked at the same time to modify them all atomically, which is prone to deadlocks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: This idea has been discussed to death many times before. Please search the archives. Somewhat related to the visibility in index thing: would it be possible to have a kind of index-table ? We do have here some tables which have 2-4 fields, and the combination of them forms the primary key of the table. There are usually no other indexes on the table, and the net result is that the PK index duplicates the heap. So it would be cool if the index would be THE heap somehow... The most prominent example of this in our DBs is this table: db \d table_a Table public.table_a Column | Type | Modifiers ---++--- id_1| bigint | not null id_2| bigint | not null Indexes: pk_table_a PRIMARY KEY, btree (id_1, id_2) db select reltuples::bigint, relpages from pg_class where relname='table_a'; reltuples | relpages ---+-- 445286464 | 710090 (1 row) db select reltuples::bigint, relpages from pg_class where relname='pk_table_a'; reltuples | relpages ---+-- 445291072 | 599848 (1 row) This postgres install is compiled with 32K page size (for the ones who wonder about the page counts). In any case, it is clear that the index basically duplicates the heap... Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Including Snapshot Info with Indexes
Csaba Nagy wrote: On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: This idea has been discussed to death many times before. Please search the archives. Somewhat related to the visibility in index thing: would it be possible to have a kind of index-table ? We do have here some tables which have 2-4 fields, and the combination of them forms the primary key of the table. There are usually no other indexes on the table, and the net result is that the PK index duplicates the heap. So it would be cool if the index would be THE heap somehow... The clustered index patch I worked on for 8.3, but didn't make it in, would have helped that use case a lot. A column-store kind of mechanism would be nice. Some columns could be stored in index-like structures, while other would be in the heap. I haven't seen any practical proposal on how to do it though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: Currently The index implementation in Postgresql does not store the Snapshot information in the Index. If we add the snapshot information into the indexing structure, we will have the following advantages. This idea has been discussed to death many times before. Please search the archives. a) There can be index only scans like Oracle IMO, the most promising approach to achieving index-only-scans at the moment is the Dead Space Map, as discussed in the 8.3 dev cycle. Index only scans means that in order to get certain results, we may not goto the table at all. For example, if you have an index on columns a and b, and if there is a query like select b from table where a between a1 and a2, then the explain plan need not goto the table. I can't understand how dead space map will provide such a functionality. In short each index will act like an Index Organized Table, if the all the columns of the query are present in the index. b) Unique indexes will become less costly, as older index tuples can be found out. Doesn't seem like a big benefit, considering that in most cases there won't be any tuples in the index with a duplicate key. A common exception to that is (non-HOT) updating a row. But in that case, the page containing the old tuple is already in cache, so the lookup of the visibility from the heap is cheap. Its not a big benefit. agreed. c) Even the index scans will get faster, since some of the index tuples won't translate into HeapScans. That's the same as doing an index-only-scan, right? No here if you have an index on a(say). If there is a query like select * form table where a between a1 and a2, currently the scan goes to the table to verify the visibility. Of course if the tuple satisfies vacuum, then it is marked in the index, which is an optimization. This is not index-only scan. This is a normal index scan, which can skip certain random I/Os. d) Deletes and Updates will become slightly costly, as they have to update these indexes. I think you're grossly underestimating the cost of that. For example, on a table with 3 indexes. a delete currently requires one index lookup + one heap lookup. With visibility in the indexes, that would require 3 index lookups + one heap lookup. That's 4 vs. 2 page accesses, not taking into account the non-leaf b-tree pages. The real impact will depend on what's in cache, but the cost can be very high. That's true. But i am not asking to replace the current index implementation, but to provide an extra option while indexing. Say if a particular database setup doesn't do much deletes and updates(imagine tables with partitioning, where the partitions/tables are dropped instead of deletes. They can have an option to create index .. with snapshot Imagine the Index Vacuum also will do lesser Random I/Os Also, the full visibility information would need 12 bytes of space per tuple. An index tuple on an int4 key currently takes 12 bytes, so that would double the index size. Storage size has a big impact on performance. More bytes means more I/O, less data fits in cache, and more WAL traffic. I am thinking of certain optimizations here. we have a bit unused in indextuple structure. If a particular tuple is not deleted, then we can signify that using that bit and save 6 bytes of saving the xmax and cmax. We are trading of this space efficiency in place of Random I/Os, which is not a bad trade-off , i suppose. Again this is going to optional for the user. If users have an option to create Bitmap index/ Binary index, why can't they have this option as well? There's non-trivial implementation issues involved as well. You'd need a way to reliably find all the index pointers for a given heap tuple (search the archives for retail vacuum for the issues involved in that. Broken user defined functions are a problem for example). And you'd need to keep them all locked at the same time to modify them all atomically, which is prone to deadlocks. I think Vacuum need not goto the table, as the visibility information is present in the index itself. I don't know whether i have given the correct answer here. Expecting your reply.. Thanks, Gokul.
Re: [HACKERS] Including Snapshot Info with Indexes
On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Csaba Nagy wrote: On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: This idea has been discussed to death many times before. Please search the archives. Somewhat related to the visibility in index thing: would it be possible to have a kind of index-table ? We do have here some tables which have 2-4 fields, and the combination of them forms the primary key of the table. There are usually no other indexes on the table, and the net result is that the PK index duplicates the heap. So it would be cool if the index would be THE heap somehow... The clustered index patch I worked on for 8.3, but didn't make it in, would have helped that use case a lot. A column-store kind of mechanism would be nice. Some columns could be stored in index-like structures, while other would be in the heap. I haven't seen any practical proposal on how to do it though. I think it more resembles the Oracle's IOT with overflow. I think my proposal, once implemented can be easily extended to incorporate IOT/Clustered indexes. One thing is for sure. Without storing Visibility info, Unique Secondary indexes(Indexes on IOT/Clustered indexed tables) is not possible, as it is not possible to re-locate the same entry in a b-tree, if we are storing the Primary key in place of tuple-id. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: IMO, the most promising approach to achieving index-only-scans at the moment is the Dead Space Map, as discussed in the 8.3 dev cycle. Index only scans means that in order to get certain results, we may not goto the table at all. For example, if you have an index on columns a and b, and if there is a query like select b from table where a between a1 and a2, then the explain plan need not goto the table. I can't understand how dead space map will provide such a functionality. Please read the discussions in the archives. The dead space map holds visibility information in a condensed form. For index-only-scans, we need to know if all tuples on page are are visible to us. If the dead space map is designed with index-only-scans in mind, we can store a bit there indicating all tuples on this page are visible to everyone. Pages that have that bit set don't need to be visited to check visibility. What information exactly is going to be stored in the dead space map is still debated. For vacuuming, we need to know which pages contain dead tuples that are worth vacuuming, which isn't the same thing as all tuples are visible to everyone, but it's quite close. Heap pages that do have dead or recently modified rows do need to be visited, so the executor needs to always be prepared to check visibility from the heap. However, on a table that's not very frequently updated, most bits will be set and the effect will be the same as genuine index-only-scans. On a table that is frequently updated, you would suffer a big hit in update performance with the duplicate visibility information in all indexes scheme as well, as the updates would need to update the indexes as well, so the performance characteristics are roughly the same. That's true. But i am not asking to replace the current index implementation, but to provide an extra option while indexing. Say if a particular database setup doesn't do much deletes and updates(imagine tables with partitioning, where the partitions/tables are dropped instead of deletes. They can have an option to create index .. with snapshot A nice property of utilizing the dead space map for index-only-scans is that it doesn't require any action from the DBA. It will just work. It also adapts well to tables that have parts that are frequently updated, and other parts that are not. The frequently updated parts will behave like what we have now, index-only-scans are not possible because, but deletes/updates are cheap. But the less frequently updated parts will eventually have the bits set in the map, and we can do index-only-scans for those parts. Imagine the Index Vacuum also will do lesser Random I/Os Index vacuum doesn't do random I/Os as it is. Also, the full visibility information would need 12 bytes of space per tuple. An index tuple on an int4 key currently takes 12 bytes, so that would double the index size. Storage size has a big impact on performance. More bytes means more I/O, less data fits in cache, and more WAL traffic. I am thinking of certain optimizations here. we have a bit unused in indextuple structure. If a particular tuple is not deleted, then we can signify that using that bit and save 6 bytes of saving the xmax and cmax. We are trading of this space efficiency in place of Random I/Os, which is not a bad trade-off , i suppose. Again this is going to optional for the user. If users have an option to create Bitmap index/ Binary index, why can't they have this option as well? Because we have to maintain it? :) Speaking of bitmap indexes, that would be nice to have. It looks like Gavin dropped the ball on the patch, so if you want to continue that work, that would be great. There's non-trivial implementation issues involved as well. You'd need a way to reliably find all the index pointers for a given heap tuple (search the archives for retail vacuum for the issues involved in that. Broken user defined functions are a problem for example). And you'd need to keep them all locked at the same time to modify them all atomically, which is prone to deadlocks. I think Vacuum need not goto the table, as the visibility information is present in the index itself. Vacuum isn't the problem here. The problem is: given heap tuple X, how do you find the the corresponding index tuples? The obvious solution is to calculate the index keys from the heap tuple, and use them to look up. But what if you have an expression index on a user-defined function, and that user-defined function is broken so that it returns a different value than when the index tuple was inserted? You won't find the index tuples in that case, so you won't be able to update the visibility information. Granted, you've got a broken user-defined-function in that case, and you're going to get bogus query results anyway. But not finding the index tuple when needed would lead to more serious
Re: [HACKERS] Including Snapshot Info with Indexes
Ühel kenal päeval, E, 2007-10-08 kell 11:41, kirjutas Heikki Linnakangas: The dead space map holds visibility information in a condensed form. For index-only-scans, we need to know if all tuples on page are are visible to us. If the dead space map is designed with index-only-scans in mind, we can store a bit there indicating all tuples on this page are visible to everyone. Pages that have that bit set don't need to be visited to check visibility. What information exactly is going to be stored in the dead space map is still debated. For vacuuming, we need to know which pages contain dead tuples that are worth vacuuming, which isn't the same thing as all tuples are visible to everyone, but it's quite close. I would prefer a separate MVC visibility heap (aka. extended dead space map) which would duplicate whole visibility info from heap pages, just in compressed form. After a few releases with duplicated visibility info, we could remove it from the data heap. If the whole visibility info (cmin, cmax, tmin, tmax, flags, (+ size for DSM uses)) for tuples, were in a separate heap, it would allow for a lot of on-the-fly compression. for example we could: * get rid of both tmin and tmax for all completed transactions * reduce any deleted tuple to just flags * reduce any tuple produced by aborted transaction to just flags * reduce any tuple visible to all backends to just flags * RRL compress (runs of) pages produced by same transaction * RRL compress (runs of) pages with all tuples visible * RRL compress (runs of) pages with all tuples deleted depending on distribution of Inserts/Updates/Deletes it will make visibility info a little or a lot smaller than it is currently, greatly enchancing chances that it stays in cache (even for OLAP loads, because data for these are usually produced by bulk inserts and thus their visibility info is highly compressable) It also makes VACUUM more efficient, as it's initial scan (find vacuumable tuples) will need to do lot less IO. And it allows for more intelligent choices for new tuple placement , especially if we want to preserve clustering. And of course it gives you kind of index-only scans (mostly read index + check in vis.heap) - Hannu ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Including Snapshot Info with Indexes
Hi Heikki, I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. But i think i am late in the game as i haven't participated in those discussions One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. I couldn't get that piece of discussion in the archive, which discusses the design of Retail Vacuum. So please advise me again here. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? Thanks, Gokul. On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: On 10/8/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: IMO, the most promising approach to achieving index-only-scans at the moment is the Dead Space Map, as discussed in the 8.3 dev cycle. Index only scans means that in order to get certain results, we may not goto the table at all. For example, if you have an index on columns a and b, and if there is a query like select b from table where a between a1 and a2, then the explain plan need not goto the table. I can't understand how dead space map will provide such a functionality. Please read the discussions in the archives. The dead space map holds visibility information in a condensed form. For index-only-scans, we need to know if all tuples on page are are visible to us. If the dead space map is designed with index-only-scans in mind, we can store a bit there indicating all tuples on this page are visible to everyone. Pages that have that bit set don't need to be visited to check visibility. What information exactly is going to be stored in the dead space map is still debated. For vacuuming, we need to know which pages contain dead tuples that are worth vacuuming, which isn't the same thing as all tuples are visible to everyone, but it's quite close. Heap pages that do have dead or recently modified rows do need to be visited, so the executor needs to always be prepared to check visibility from the heap. However, on a table that's not very frequently updated, most bits will be set and the effect will be the same as genuine index-only-scans. On a table that is frequently updated, you would suffer a big hit in update performance with the duplicate visibility information in all indexes scheme as well, as the updates would need to update the indexes as well, so the performance characteristics are roughly the same. That's true. But i am not asking to replace the current index implementation, but to provide an extra option while indexing. Say if a particular database setup doesn't do much deletes and updates(imagine tables with partitioning, where the partitions/tables are dropped instead of deletes. They can have an option to create index .. with snapshot A nice property of utilizing the dead space map for index-only-scans is that it doesn't require any action from the DBA. It will just work. It also adapts well to tables that have parts that are frequently updated, and other parts that are not. The frequently updated parts will behave like what we have now, index-only-scans are not possible because, but deletes/updates are cheap. But the less frequently updated parts will eventually have the bits set in the map, and we can do index-only-scans for those parts. Imagine the Index Vacuum also will do lesser Random I/Os Index vacuum doesn't do random I/Os as it is. Also, the full visibility information would need 12 bytes of space per tuple. An index tuple on an int4 key currently takes 12 bytes, so that would double the index size. Storage size has a big impact on performance. More bytes means more I/O, less data fits in cache, and more WAL traffic. I am thinking of certain optimizations here. we have a bit unused in indextuple structure. If a particular tuple is not deleted, then we can
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: Hi Heikki, I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. But i think i am late in the game as i haven't participated in those discussions While the DSM might be similar in spirit to a bitmap index, the actual implementation has a lot more freedome I'd say, since you can tailor it exactly to the need of tracking some summarized visibility info. So not all shortcomings of bitmap indices must necessarily apply to the DSM also. But of course thats mostly handwavering... One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. A simple DSM would probably contain a bit per page that says all xmin GlobalXmin, and all xmax unset or aborted. That bit would only get SET during VACUUM, and only unset during INSERT/UPDATE/DELETE. If setting it is protected by a VACUUM-grade lock on the page, we might get away with no locking during the unset, making the locking overhead pretty small. I couldn't get that piece of discussion in the archive, which discusses the design of Retail Vacuum. So please advise me again here. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? Postgres already distinguishes VOLATILE,STABLE and IMMUTABLE functions. It doesn't, however, risk physical data corruption, even if you get that classification wrong. The worst that happens AFAIK are wrong query results - but fixing your function, followed by a REINDEX always corrects the problme. If you start poking holes into that safety net, there'll be a lot of pushback I believe - and IMHO rightly so, because people do, and always will, get such classifications wrong. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: I am always slightly late in understanding things. Let me try to understand the use of DSM. It is a bitmap index on whether all the tuples in a particular block is visible to all the backends, whether a particular block contains tuples which are invisible to everyone. But i think this will get subjected to the same limitations of Bitmap index. Even Oracle suggests the use of Bitmap index for only data warehousing tables, where the Bitmap indexes will be dropped and recreated after every bulk load. This is not a viable alternative for OLTP transactions. Well, it's not quite the same as a bitmap index, though both use a bitmap. You didn't quite get into details on what the limitations are and why it wouldn't be suitable for OLTP, but I don't see any significant problems. But i think i am late in the game as i haven't participated in those discussions Better late than never :). One Bitmap index block usually maps to lot of blocks in the heap. So locking of one page to update the DSM for update/delete/insert would hit the concurrency. But again all these are my observation w.r.t oracle bitmap indexes. May be i am missing something in DSM. Yeah, the DSM page could become a contention bottleneck. My current thinking is that we'd have a flag in the heap page header, that would be set together with the bit in the DSM. When the flag in the page header is set, you don't need to lock and update the DSM because you know the bit is already set. Vacuum would have to clear both the DSM bit and the flag. Let's take up Retail Vacuuming again. The User defined function which would return different values at different time can be classified as non-deterministic functions. We can say that this index cannot be created on a non-deterministic function. This is the way it is implemented in Oracle. What they have done is they have classified certain built-in operators and functions as deterministic. Similarly they have classified a few as non-deterministic operators and functions. Can we follow a similar approach? We already do. A function must be marked as IMMUTABLE in order to use it in an index expression. But we can't enforce that the user defined function really behaves like an immutable function should. If someone creates a user-defined function in C that calls the C random() function, we can't stop it. As I said earlier, using an index like that will of course lead to bogus results. But it won't currently cause any server crashes or more serious corruption. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster