Re: [HACKERS] Unlogged tables, persistent kind
On Tue, Apr 26, 2011 at 8:49 AM, Leonardo Francalanci m_li...@yahoo.it wrote: If that 1% is random (not time/transaction related), usually you'd rather have an empty table. Why do you think it would be random? Heap blocks would be zeroed if they were found to be damaged, following a crash. If you erase full blocks, you have no idea what data you erased; it could be something changed 1 hour ago, 1 month ago, 1 year ago. This is very different from, say, synchronous_commit=off: in that case, the most recent transactions may be lost if the database should crash. In your case, some (who knows which???) data is lost. So, to me that sounds like random loss. I don't think that that is different from a corrupted table. You're not deleting rows recently changed; you're deleting everything that is physically close to it. In other words: is a table that is not consistant with anything else in the db useful? That's too big a leap. Why would it suddenly be inconsistent with the rest of the database? If you delete some data, and you have no idea what data you lost, I don't think you have a consistent db. Unless, of course, your table has no relation with any other table in the db. Of course, all these thoughts are based on the assumption that I know what happens when a block is erased; but my knowledge of postgresql internals is not so good, so I might be *very* wrong You're assuming that there are referential links *from* other tables to the table with damage. In which case you would be correct. But of course, if you needed that data for integrity you would never do that, so the problem is a nonexistent use case. The suggested mode is for Fact data, not reference tables. The current assessment is that UNLOGGED tables are useful only for running a data cache. If the database crashes, then the table is truncated and you must refill the cache. If that is the case, then it must surely be better to have a cache that is already 99% full, than one which starts at empty. There is no damage or loss because parts of the cache were missing. Unlogged Tables are currently so volatile they are unusable for any other purpose. I want to see a table that is useful for low value data, such as sensor data. If you had 10 TB of sensor data and the database crashes, then you want to lose a few blocks, not the whole lot. Low value = rare, minor loss is acceptable, but it doesn;t mean total data loss is acceptable. For that use case, total loss is catastrophic, not just mildly irritating. If you are a Telco, losing a few minutes billing data costs much less than having every server have better hardware so it can cope with high WAL traffic as well. They don't want to lose the data, but its a cost based trade off. Consistency is not an issue, you are just missing some data. That is normal anyway, since sensor data generators (mobile devices etc) frequently fail, are offline, turned off etc, so there isn't even a definition of what complete data is supposed to look like. The missing data looks exactly like lots of people turned their phones off for a few minutes. So my suggestion makes UNLOGGED tables more useful for the use case they were designed to address - cached data (AIUI), plus they allow another use case that doesn't seem to be well understood, low value data in massive data volumes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Tue, May 3, 2011 at 8:21 PM, Simon Riggs si...@2ndquadrant.com wrote: The current assessment is that UNLOGGED tables are useful only for running a data cache. If the database crashes, then the table is truncated and you must refill the cache. If that is the case, then it must surely be better to have a cache that is already 99% full, than one which starts at empty. There is no damage or loss because parts of the cache were missing. That's not necessarily the case of course. I've written caches before where a set of records would be present for each object being cached. I could deal with a whole set not being present but if just some of the records were missing then I would serve incorrect results. Since they were created in a single transacion I should be able to rely on the whole set being present or missing consistently. That doesn't mean there aren't cases where that's true of course. In an ideal world the database would guarantee that you couldn't use the table in this way and fail to get the consistency you expect. Something like getting an error if you try to modify two rows in an unlogged table during a single transaction. I'm not sure how to do that reasonably though. For that use case, total loss is catastrophic, not just mildly irritating. If you are a Telco, losing a few minutes billing data costs much less than having every server have better hardware so it can cope with high WAL traffic as well. They don't want to lose the data, but its a cost based trade off. This analysis is dead on. That's precisely how businesses evaluate this question. Consistency is not an issue, you are just missing some data. That is normal anyway, since sensor data generators (mobile devices etc) frequently fail, are offline, turned off etc, so there isn't even a definition of what complete data is supposed to look like. The missing data looks exactly like lots of people turned their phones off for a few minutes. I don't think that's true however. Consider if I have a rollup table that contains aggregated sums of that data. If you lose some of the records then my aggregates don't add up to the correct values any more. Or consider if you are counting sensor data like total data transferred and session count -- and then reporting the average data transferred per session. And you accidentally lose a bunch of sessions. Now your data/session report will be reporting false information. This is true even if you only lose recently committed rows. But losing whole blocks means you risk losing random old data which makes it hard to work around by, say, purging recently aggregated data. So my suggestion makes UNLOGGED tables more useful for the use case they were designed to address - cached data (AIUI), plus they allow another use case that doesn't seem to be well understood, low value data in massive data volumes. There other approaches as well. Foreign data wrappers mean you could do things like store the low value data in raw text files or other systems like memcached or Hadoop or whatever. I'm not saying there's no reason to do something in Postgres but if you're being bitten by Postgres's block-oriented random access storage it may be a problem too fundamental to solve without addressing the underlying storage strategy? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
Simon Riggs si...@2ndquadrant.com writes: You're assuming that there are referential links *from* other tables to the table with damage. In which case you would be correct. But of course, if you needed that data for integrity you would never do that, so the problem is a nonexistent use case. The suggested mode is for Fact data, not reference tables. So I suppose your notion of fact data includes no fields that are wide enough to need toasting? Because as soon as you have any out-of-line values, there's an equivalent of foreign keys behind the scenes, where the user can't see it (until he gets missing chunk number or some such error). The current assessment is that UNLOGGED tables are useful only for running a data cache. If the database crashes, then the table is truncated and you must refill the cache. If that is the case, then it must surely be better to have a cache that is already 99% full, than one which starts at empty. There is no damage or loss because parts of the cache were missing. A cache that starts at 99% full of untrustworthy data is NOT better. Unlogged Tables are currently so volatile they are unusable for any other purpose. I want to see a table that is useful for low value data, such as sensor data. Basically, you're being hopelessly optimistic both about the extent to which a crash is likely to render data inconsistent, and our ability to detect that inconsistency. It doesn't matter whether the data is low value, the difficulty of cleaning up remains the same. I don't want to deal with trying to detect that, and I definitely don't want to dump the problems onto users. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Tue, May 3, 2011 at 4:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: You're assuming that there are referential links *from* other tables to the table with damage. In which case you would be correct. But of course, if you needed that data for integrity you would never do that, so the problem is a nonexistent use case. The suggested mode is for Fact data, not reference tables. So I suppose your notion of fact data includes no fields that are wide enough to need toasting? Because as soon as you have any out-of-line values, there's an equivalent of foreign keys behind the scenes, where the user can't see it (until he gets missing chunk number or some such error). The current assessment is that UNLOGGED tables are useful only for running a data cache. If the database crashes, then the table is truncated and you must refill the cache. If that is the case, then it must surely be better to have a cache that is already 99% full, than one which starts at empty. There is no damage or loss because parts of the cache were missing. A cache that starts at 99% full of untrustworthy data is NOT better. That's a bit pessimistic. The case that bugs me is that a cache that's 99% trustworthy, but where I have no idea that: a) 1% of it is crud, and b) Which 1% of it is crud is still a pretty unacceptable scenario. I head back to our policy for handling caches: If in doubt, TRUNCATE. That policy would be nicely consistent with the way 9.1 deals with unlogged tables. Unlogged Tables are currently so volatile they are unusable for any other purpose. I want to see a table that is useful for low value data, such as sensor data. Basically, you're being hopelessly optimistic both about the extent to which a crash is likely to render data inconsistent, and our ability to detect that inconsistency. It doesn't matter whether the data is low value, the difficulty of cleaning up remains the same. I don't want to deal with trying to detect that, and I definitely don't want to dump the problems onto users. +1, on both grounds. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
If that 1% is random (not time/transaction related), usually you'd rather have an empty table. Why do you think it would be random? Heap blocks would be zeroed if they were found to be damaged, following a crash. If you erase full blocks, you have no idea what data you erased; it could be something changed 1 hour ago, 1 month ago, 1 year ago. This is very different from, say, synchronous_commit=off: in that case, the most recent transactions may be lost if the database should crash. In your case, some (who knows which???) data is lost. So, to me that sounds like random loss. I don't think that that is different from a corrupted table. You're not deleting rows recently changed; you're deleting everything that is physically close to it. In other words: is a table that is not consistant with anything else in the db useful? That's too big a leap. Why would it suddenly be inconsistent with the rest of the database? If you delete some data, and you have no idea what data you lost, I don't think you have a consistent db. Unless, of course, your table has no relation with any other table in the db. Of course, all these thoughts are based on the assumption that I know what happens when a block is erased; but my knowledge of postgresql internals is not so good, so I might be *very* wrong -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
2011/4/25 Christopher Browne cbbro...@gmail.com: On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh jes...@krogh.cc wrote: On 2011-04-25 20:00, Leonardo Francalanci wrote: The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? Depends on the application, if it serves for pure caching then it is fully acceptable and way better than dropping everything. Whoah... When cacheing, the application already needs to be able to cope with the case where there's nothing in the cache. This means that if the cache gets truncated, it's reasonable to expect that the application won't get deranged - it already needs to cope with the case where data's not there and needs to get constructed. That is true but the application performance has already to cope with a server crash/restart. Many things you can add to make the restart (for the application) more 'smooth' is good. In contrast, if *wrong* data is in the cache, that could very well lead to wrong behavior on the part of the application. And there may not be any mechanism aside from cache truncation that will rectify that. It seems to me that it's a lot riskier to try to preserve contents of such tables than it is to truncate them. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Apr 24, 2011, at 1:22 PM, Simon Riggs si...@2ndquadrant.com wrote: Unlogged tables are a good new feature. Thanks. I noticed Bruce had mentioned they were the equivalent of NoSQL, which I don't really accept. Me neither. I thought that was poorly said. Heap blocks would be zeroed if they were found to be damaged, following a crash. The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never written at all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one but not both of A and B out to the O/S, crash. Everything on disk is a valid page, but they are not coherent taken as a whole. It's normally XLOG replay that fixes this type of situation... I thought about this problem a bit and I think you could perhaps deal with it by having some sort of partially logged table, where we would XLOG just enough to know which blocks or relations had been modified and only nuke enough data to be certain of being safe. But it isn't clear that there is much use case for this, especially because I think it would give up nearly all the performance benefit. I do think it might be useful to have an unlogged index on a logged table, somehow frobnicated so that on a crash the index is known invalid and not used until a REINDEX is performed. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would always preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it would give a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes, and it doesn't look too complicated (but maybe I'm missing something). I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a timestamp, and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be there, so we would like to set a partition table to 'logged'. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 8:36 AM, Leonardo Francalanci m_li...@yahoo.it wrote: The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would always preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it would give a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes, and it doesn't look too complicated (but maybe I'm missing something). I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a timestamp, and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be there, so we would like to set a partition table to 'logged'. I agree that unlogged indexes on a logged heap are better for resilience and are likely to be the best first step. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas robertmh...@gmail.com wrote: On Apr 24, 2011, at 1:22 PM, Simon Riggs si...@2ndquadrant.com wrote: Unlogged tables are a good new feature. Thanks. I noticed Bruce had mentioned they were the equivalent of NoSQL, which I don't really accept. Me neither. I thought that was poorly said. Heap blocks would be zeroed if they were found to be damaged, following a crash. The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never written at all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one but not both of A and B out to the O/S, crash. Everything on disk is a valid page, but they are not coherent taken as a whole. It's normally XLOG replay that fixes this type of situation... Not really sure it matters what the cause of data loss is, does it? The zeroing of the blocks definitely causes data loss but the intention is to bring the table back to a consistent physical state, not to in any way repair the data loss. Repeating my words above, this proposed option trades potential minor data loss for performance. The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. This is important on big tables where reloading from a backup might take a long time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas robertmh...@gmail.com wrote: On Apr 24, 2011, at 1:22 PM, Simon Riggs si...@2ndquadrant.com wrote: Unlogged tables are a good new feature. Thanks. I noticed Bruce had mentioned they were the equivalent of NoSQL, which I don't really accept. Me neither. I thought that was poorly said. Heap blocks would be zeroed if they were found to be damaged, following a crash. The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never written at all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one but not both of A and B out to the O/S, crash. Everything on disk is a valid page, but they are not coherent taken as a whole. It's normally XLOG replay that fixes this type of situation... Not really sure it matters what the cause of data loss is, does it? The zeroing of the blocks definitely causes data loss but the intention is to bring the table back to a consistent physical state, not to in any way repair the data loss. Right, but the trick is how you identify which blocks you need to zero. You used the word damaged, which to me implied that the block had been modified in some way but ended up with other than the expected contents, so that something like a CRC check might detect the problem. My point (as perhaps you already understand) is that you could easily have a situation where every block in the table passes a hypothetical block-level CRC check, but the table as a whole is still damaged because update chains aren't coherent. So you need some kind of mechanism for identifying which portions of the table you need to zero to get back to a guaranteed-coherent state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 3:36 AM, Leonardo Francalanci m_li...@yahoo.it wrote: The only data we can't rebuild it's the heap. So what about an option for UNlogged indexes on a LOGged table? It would always preserve data, and it would 'only' cost a rebuilding of the indexes in case of an unclean shutdown. I think it would give a boost in performance for all those cases where the IO (especially random IO) is caused by the indexes, and it doesn't look too complicated (but maybe I'm missing something). +1. I proposed the unlogged to logged patch (BTW has anyone given a look at it?) because we partition data based on a timestamp, and we can risk loosing the last N minutes of data, but after N minutes we want to know data will always be there, so we would like to set a partition table to 'logged'. That approach is something I had also given some thought to, and I'm glad to hear that people are thinking about doing it in the real world. I'm planning to look at your patch, but I haven't gotten to it yet, because I'm giving priority to anything that must be done to get 9.1beta1 out the door. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On 2011-04-25 20:00, Leonardo Francalanci wrote: The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? Depends on the application, if it serves for pure caching then it is fully acceptable and way better than dropping everything. -- Jesper
Re: [HACKERS] Unlogged tables, persistent kind
Jesper Krogh jes...@krogh.cc wrote: On 2011-04-25 20:00, Leonardo Francalanci wrote: The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? Depends on the application, if it serves for pure caching then it is fully acceptable and way better than dropping everything. I buy this *if* we can be sure we're not keeping information which is duplicated or mangled, and if we can avoid crashing the server to a panic because of broken pointers or other infelicities. I'm not sure that can't be done, but I don't think I've heard an explanation of how that could be accomplished, particularly without overhead which would wipe out the performance benefit of unlogged tables. (And without a performance benefit, what's the point?) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 2:03 PM, Jesper Krogh jes...@krogh.cc wrote: On 2011-04-25 20:00, Leonardo Francalanci wrote: The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. In other words: is a table that is not consistant with anything else in the db useful? Depends on the application, if it serves for pure caching then it is fully acceptable and way better than dropping everything. Whoah... When cacheing, the application already needs to be able to cope with the case where there's nothing in the cache. This means that if the cache gets truncated, it's reasonable to expect that the application won't get deranged - it already needs to cope with the case where data's not there and needs to get constructed. In contrast, if *wrong* data is in the cache, that could very well lead to wrong behavior on the part of the application. And there may not be any mechanism aside from cache truncation that will rectify that. It seems to me that it's a lot riskier to try to preserve contents of such tables than it is to truncate them. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 7:00 PM, Leonardo Francalanci m_li...@yahoo.it wrote: The amount of data loss on a big table will be 1% of the data loss caused by truncating the whole table. If that 1% is random (not time/transaction related), usually you'd rather have an empty table. Why do you think it would be random? In other words: is a table that is not consistant with anything else in the db useful? That's too big a leap. Why would it suddenly be inconsistent with the rest of the database? Not good arguments. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 1:42 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Apr 25, 2011 at 5:04 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Apr 25, 2011 at 8:14 AM, Robert Haas robertmh...@gmail.com wrote: On Apr 24, 2011, at 1:22 PM, Simon Riggs si...@2ndquadrant.com wrote: Unlogged tables are a good new feature. Thanks. I noticed Bruce had mentioned they were the equivalent of NoSQL, which I don't really accept. Me neither. I thought that was poorly said. Heap blocks would be zeroed if they were found to be damaged, following a crash. The problem is not so much the blocks that are damaged (e.g. half-written, torn page) but the ones that were never written at all. For example, read page A, read page B, update tuple on page A putting new version on page B, write one but not both of A and B out to the O/S, crash. Everything on disk is a valid page, but they are not coherent taken as a whole. It's normally XLOG replay that fixes this type of situation... Not really sure it matters what the cause of data loss is, does it? The zeroing of the blocks definitely causes data loss but the intention is to bring the table back to a consistent physical state, not to in any way repair the data loss. Right, but the trick is how you identify which blocks you need to zero. You used the word damaged, which to me implied that the block had been modified in some way but ended up with other than the expected contents, so that something like a CRC check might detect the problem. My point (as perhaps you already understand) is that you could easily have a situation where every block in the table passes a hypothetical block-level CRC check, but the table as a whole is still damaged because update chains aren't coherent. So you need some kind of mechanism for identifying which portions of the table you need to zero to get back to a guaranteed-coherent state. That sounds like progress. The current mechanism is truncate complete table. There are clearly other mechanisms that would not remove all data. Probably the common case would be for insert-only data. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Mon, Apr 25, 2011 at 2:21 PM, Simon Riggs si...@2ndquadrant.com wrote: Right, but the trick is how you identify which blocks you need to zero. You used the word damaged, which to me implied that the block had been modified in some way but ended up with other than the expected contents, so that something like a CRC check might detect the problem. My point (as perhaps you already understand) is that you could easily have a situation where every block in the table passes a hypothetical block-level CRC check, but the table as a whole is still damaged because update chains aren't coherent. So you need some kind of mechanism for identifying which portions of the table you need to zero to get back to a guaranteed-coherent state. That sounds like progress. The current mechanism is truncate complete table. There are clearly other mechanisms that would not remove all data. No doubt. Consider a block B. If the system crashes when block B is dirty either in the OS cache or shared_buffers, then you must zero B, or truncate it away. If it was clean in both places, however, it's good data and you can keep it. So you can imagine for example a scheme where imagine that the relation is divided into 8MB chunks, and we WAL-log the first operation after each checkpoint that touches a chunk. Replay zeroes the chunk, and we also invalidate all the indexes (the user must REINDEX to get them working again). I think that would be safe, and certainly the WAL-logging overhead would be far less than WAL-logging every change, since we'd need to emit only ~16 bytes of WAL for every 8MB written, rather than ~8MB of WAL for every 8MB written. It wouldn't allow some of the optimizations that the current unlogged tables can get away with only because they WAL-log exactly nothing - and selectively zeroing chunks of a large table might slow down startup quite a bit - but it might still be useful to someone. However, I think that the logged table, unlogged index idea is probably the most promising thing to think about doing first. It's easy to imagine all sorts of uses for that sort of thing even in cases where people can't afford to have any data get zeroed, and it would provide a convenient building block for something like the above if we eventually wanted to go that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
Robert Haas robertmh...@gmail.com writes: However, I think that the logged table, unlogged index idea is probably the most promising thing to think about doing first. +1 for that --- it's clean, has a clear use-case, and would allow us to manage the current mess around hash indexes more cleanly. That is, hash indexes would always be treated as unlogged. (Or of course we could fix the lack of WAL logging for hash indexes, but I notice a lack of people stepping up to do that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unlogged tables, persistent kind
Unlogged tables are a good new feature. I noticed Bruce had mentioned they were the equivalent of NoSQL, which I don't really accept. I guess it depends upon whether you mean NoSQL for caches (e.g. memcached) or NoSQL for useful datastores (e.g. Mongo). It seems worth discussin now before we get too far into the marketing hype around Beta. If you don't log changes to tables you have two choices if we crash 1) truncate the table and any indexes 2) rebuild any indexes damaged by the crash Currently, we do (1). That certainly has its place but most data stores don't do this if they crash, since it would lead to data loss. Not just a couple of rows either - serious, major data loss if you put the wrong kind of data in it. We even delete data that has been safely on disk for weeks, months, which IMHO some people could easily get confused about. In the future, I would like to work on (2), which preserves as much data as possible, while recognising indexes may be damaged. I don't really have any name for this, since the current naming seems to assume there is only one kind of unlogged table. My implementation path for that would be to add a crash_number onto pg_control and pg_index. Any index marked as unlogged, persistent would only be usable if it's crash number is the same as current system crash number. REINDEX would update the index crash number to current value. That also allows us to imagine a repair index command in the future as well. Heap blocks would be zeroed if they were found to be damaged, following a crash. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Sun, Apr 24, 2011 at 6:22 PM, Simon Riggs si...@2ndquadrant.com wrote: My implementation path for that would be to add a crash_number onto pg_control and pg_index. Any index marked as unlogged, persistent would only be usable if it's crash number is the same as current system crash number. REINDEX would update the index crash number to current value. That also allows us to imagine a repair index command in the future as well. This seems useful for non-crash-safe indexes in general. Heap blocks would be zeroed if they were found to be damaged, following a crash. How do you propose to detect that? Until we solve the whole checksum story I don't think we have a reliable way to detect bad pages. And in some cases where do detect them we would detect them by crashing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
Simon Riggs si...@2ndquadrant.com writes: If you don't log changes to tables you have two choices if we crash 1) truncate the table and any indexes 2) rebuild any indexes damaged by the crash No, you have only one choice, and that's (1), because there's no guarantee that what's in the table file is meaningful. Heap blocks would be zeroed if they were found to be damaged, following a crash. This is sheerest fantasy. And even if you could implement it, what sort of feature would you be offering? Your data is preserved except when it isn't? People who want that can go use mysql. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Sun, Apr 24, 2011 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: If you don't log changes to tables you have two choices if we crash 1) truncate the table and any indexes 2) rebuild any indexes damaged by the crash No, you have only one choice, and that's (1), because there's no guarantee that what's in the table file is meaningful. Heap blocks would be zeroed if they were found to be damaged, following a crash. This is sheerest fantasy. And even if you could implement it, what sort of feature would you be offering? Your data is preserved except when it isn't? People who want that can go use mysql. AFAIUI, a great many people do. I am proposing a non-default mode, requiring explicit activation by user which preserves as much data as possible. I am fully aware that what is proposed is not an optimisation, but a downgrading of normal resilience in exchange for some data loss in the event of a crash. Yes, many other systems support this and people are becoming persuaded that such risk/reward choices make sense for them. I see no reason not to provide an option to do this, so people can make informed choices. For large sets of low value data, it makes sense. Deleting all data, just simply because some of it might be damaged, is not the only option. IMHO deleting all the data is a surprising option that will cause many people to curse us. I don't see preserving some of the data as being worse. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Sun, Apr 24, 2011 at 6:15 PM, Simon Riggs si...@2ndquadrant.com wrote: For large sets of low value data, it makes sense. Deleting all data, just simply because some of it might be damaged, is not the only option. IMHO deleting all the data is a surprising option that will cause many people to curse us. I don't see preserving some of the data as being worse. For the cache table case, it is *certainly* reasonable to delete everything upon discovering the risk that some might be damaged. I have seen cases in production where the fix to 'oh, looks like something's corrupted' is indeed truncate the cache, where that has become a Standard Operating Procedure. Sure, to have Postgres do this is a bit heavy-handed, but it's not as if this isn't going to be heavily documented with warnings like: Contains live plague bacteria. Beware the Rabid Hippopotami. May cause bleeding at the eyes. If your database crashes, this table WILL get truncated at startup time. If the docs are short on warnings, that should probably get rectified. (I'm allowed to be volunteered to do so :-).) I'd actually find it unsurprising for such tables to get truncated even on a clean restart; I'd favor that being an option, as along as make it thus were generally unproblematic. If the application using such a table can't cope with that logic, better to induce an understanding of that sooner rather than later ;-). It seems like a losing battle to try terribly hard to keep the data around when, by marking it unlogged, the data definition specifically warned that this was risky. I'd not go so far as to suggest having autovac TRUNCATE such tables at random intervals, but that's a pathology that's not completely incompatible with the DDL declaration :-) -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Sun, Apr 24, 2011 at 7:41 PM, Greg Stark gsst...@mit.edu wrote: On Sun, Apr 24, 2011 at 6:22 PM, Simon Riggs si...@2ndquadrant.com wrote: My implementation path for that would be to add a crash_number onto pg_control and pg_index. Any index marked as unlogged, persistent would only be usable if it's crash number is the same as current system crash number. REINDEX would update the index crash number to current value. That also allows us to imagine a repair index command in the future as well. This seems useful for non-crash-safe indexes in general. Heap blocks would be zeroed if they were found to be damaged, following a crash. How do you propose to detect that? Until we solve the whole checksum story I don't think we have a reliable way to detect bad pages. And in some cases where do detect them we would detect them by crashing. That should be changed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unlogged tables, persistent kind
On Sun, Apr 24, 2011 at 11:15 PM, Simon Riggs si...@2ndquadrant.com wrote: IMHO deleting all the data is a surprising option that will cause many people to curse us. I don't see preserving some of the data as being worse. What possible damage to you want to recover from? Without WAL logging after a software crash it's possible for update chains to be broken, for multiple copies of the same tuple to be visible, for some tuples to disappear but not others, etc. And without checksums after a hardware crash it'll be possible for pages to be torn resulting in tuple pointers that land in the middle of nowhere or tuples that start off fine but are half overwritten with unrelated garbage. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers