Re: [HACKERS] Detecting corrupted pages earlier

2003-04-04 Thread Kevin Brown
Andrew Sullivan wrote: On Thu, Apr 03, 2003 at 02:39:17PM -0500, Tom Lane wrote: just not listing zero_damaged_pages in postgresql.conf.sample? We already have several variables deliberately not listed there ... Hey, that might be a good solution. Of course, it doesn't solve the

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-04 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Shouldn't each variable listed in postgresql.conf.sample have comments right above it explaining what it does anyway? Not really --- if you can't be bothered to consult the Admin Guide when in doubt, you have no business editing the config file. A word or

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Peter Eisentraut
Tom Lane writes: Andrew Sullivan expressed concern about this, too. The thing could be made a little more failsafe if we made it impossible to set ZERO_DAMAGED_PAGES to true in postgresql.conf, or by any means other than an actual SET command --- whose impact would then be limited to the

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane writes: Andrew Sullivan expressed concern about this, too. The thing could be made a little more failsafe if we made it impossible to set ZERO_DAMAGED_PAGES to true in postgresql.conf, or by any means other than an actual SET command ---

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Andrew Sullivan
On Thu, Apr 03, 2003 at 02:39:17PM -0500, Tom Lane wrote: just not listing zero_damaged_pages in postgresql.conf.sample? We already have several variables deliberately not listed there ... Hey, that might be a good solution. Of course, it doesn't solve the doomsday device problem, but nobody

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-02 16:18:33 -0500, Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Hmm...I don't know that I'd want to go that far -- setting this variable could be regarded as a policy decision. Some shops may have very good reason for running with ZERO_DAMAGED_PAGES enabled all the

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Tom Lane
Vincent van Leeuwen [EMAIL PROTECTED] writes: ... This cost us about 10 hours downtime. If I'd had the option I just would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort itself out. Yikes. If I understand this correctly, you had both critical data and cache data in

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-03 18:40:54 -0500, Tom Lane wrote: Vincent van Leeuwen [EMAIL PROTECTED] writes: ... This cost us about 10 hours downtime. If I'd had the option I just would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort itself out. Yikes. If I understand this

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: Basically, one should only turn this variable on after giving up on the possibility of getting any data out of the broken page itself. It would be folly to run with it turned on as a normal setting. This statement should *definitely* go

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Andrew Sullivan
On Wed, Apr 02, 2003 at 03:25:58PM -0500, Tom Lane wrote: the current session. This is kind of an ugly wart on the GUC mechanism, but I think not difficult to do with an assign_hook (it just has to refuse non-interactive settings). It may be an ugly wart, but I think it's only prudent. I'd

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Kevin Brown
Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: Basically, one should only turn this variable on after giving up on the possibility of getting any data out of the broken page itself. It would be folly to run with it turned on as a normal setting. This statement

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Kevin Brown
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Perhaps better would be to throw a message message any time it is turned on, reminding them it should not be left on. Is that cleaner? Where are you going to throw a message to, if it's in postgresql.conf? Bleating in the

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Jason Earl
Andrew Sullivan [EMAIL PROTECTED] writes: On Wed, Apr 02, 2003 at 03:25:58PM -0500, Tom Lane wrote: the current session. This is kind of an ugly wart on the GUC mechanism, but I think not difficult to do with an assign_hook (it just has to refuse non-interactive settings). It may be

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Hmm...I don't know that I'd want to go that far -- setting this variable could be regarded as a policy decision. Some shops may have very good reason for running with ZERO_DAMAGED_PAGES enabled all the time, but I don't know what those reasons might be.

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Andrew Sullivan
On Wed, Apr 02, 2003 at 02:07:02PM -0700, Jason Earl wrote: There are some cases where this particular feature would be useful. What needs to be done is to make the feature less dangerous to the newbie without making it less useful to the person who actually needs the functionality. I'll

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Bruce Momjian
Tom Lane wrote: Andrew Sullivan [EMAIL PROTECTED] writes: You know you have big-trouble, oh-no, ISP ran over the tapes while they were busy pitching magnets through your cage, data corruption problems, and this is your best hope for recovery? Great. Log in, turn on this option, and

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Christopher Kings-Lynne
What I'd really prefer to see is not a ZERO_DAMAGED_PAGES setting, but an explicit command to DESTROY PAGE n OF TABLE foo. That would make you manually admit defeat for each individual page before it'd drop data. But I don't presently have time to implement such a command (any volunteers

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Alvaro Herrera
On Wed, Apr 02, 2003 at 11:10:13PM -0500, Tom Lane wrote: What I'd really prefer to see is not a ZERO_DAMAGED_PAGES setting, but an explicit command to DESTROY PAGE n OF TABLE foo. That would make you manually admit defeat for each individual page before it'd drop data. But I don't

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Andrew Sullivan [EMAIL PROTECTED] writes: You know you have big-trouble, oh-no, ISP ran over the tapes while they were busy pitching magnets through your cage, data corruption problems, and this is your best hope for recovery? Great. Log in, turn on this option, and start working. But

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Huh, and what if I accidentaly mistype the number and destroy a valid page? Maybe the command should only succeed if it confirms that the page is corrupted. Good point ... but what if the corruption is subtle enough that the automatic tests don't

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-31 Thread Kevin Brown
Tom Lane wrote: Here's what I put in --- feel free to suggest better wording. ZERO_DAMAGED_PAGES (boolean) Detection of a damaged page header normally causes PostgreSQL to report an error, aborting the current transaction. Setting zero_damaged_pages to true causes the

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-30 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: Basically, one should only turn this variable on after giving up on the possibility of getting any data out of the broken page itself. It would be folly to run with it turned on as a normal setting. This statement should *definitely* go

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-29 Thread Kevin Brown
Tom Lane wrote: Kris Jurka [EMAIL PROTECTED] writes: Is zeroing the pages the only / best option? It's the only way to avoid a core dump when the system tries to process the page. And no, I don't want to propagate the notion that this page is broken beyond the buffer manager, so testing

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Tom Lane
Hiroshi Inoue [EMAIL PROTECTED] writes: How about adding a new option to skip corrupted pages ? I have committed changes to implement checking for damaged page headers, along the lines of last month's discussion. It includes a GUC variable to control the response as suggested by Hiroshi. Given

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Kris Jurka
On Fri, 28 Mar 2003, Tom Lane wrote: Hiroshi Inoue [EMAIL PROTECTED] writes: How about adding a new option to skip corrupted pages ? I have committed changes to implement checking for damaged page headers, along the lines of last month's discussion. It includes a GUC variable to control

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Tom Lane
Kris Jurka [EMAIL PROTECTED] writes: Is zeroing the pages the only / best option? It's the only way to avoid a core dump when the system tries to process the page. And no, I don't want to propagate the notion that this page is broken beyond the buffer manager, so testing elsewhere isn't an

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-19 Thread Hiroshi Inoue
Tom Lane wrote: Hiroshi Inoue [EMAIL PROTECTED] writes: Tom Lane wrote: Hiroshi Inoue [EMAIL PROTECTED] writes: Is there a way to make our way around the pages ? If the header is corrupt, I don't think so. What I asked is how to read all other sane

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Kevin Brown
Tom Lane wrote: The cases I've been able to study look like the header and a lot of the following page data have been overwritten with garbage --- when it made any sense at all, it looked like the contents of non-Postgres files (eg, plain text), which is why I mentioned the possibility of

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Tom Lane
Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: The cases I've been able to study look like the header and a lot of the following page data have been overwritten with garbage --- when it made any sense at all, it looked like the contents of non-Postgres files (eg, plain text), which is

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Greg Copeland
On Mon, 2003-02-17 at 22:04, Tom Lane wrote: Curt Sampson [EMAIL PROTECTED] writes: On Mon, 17 Feb 2003, Tom Lane wrote: Postgres has a bad habit of becoming very confused if the page header of a page on disk has become corrupted. What typically causes this corruption? Well, I'd like

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Tom Lane
Hiroshi Inoue [EMAIL PROTECTED] writes: Tom Lane wrote: I'm thinking of modifying ReadBuffer() so that it errors out if the What does the *error out* mean ? Mark the buffer as having an I/O error and then elog(ERROR). Is there a way to make our way around the pages ? If the header is

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Hannu Krosing
Tom Lane kirjutas T, 18.02.2003 kell 17:21: Kevin Brown [EMAIL PROTECTED] writes: Tom Lane wrote: The cases I've been able to study look like the header and a lot of the following page data have been overwritten with garbage --- when it made any sense at all, it looked like the contents

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Hiroshi Inoue
Tom Lane wrote: Hiroshi Inoue [EMAIL PROTECTED] writes: Tom Lane wrote: I'm thinking of modifying ReadBuffer() so that it errors out if the What does the *error out* mean ? Mark the buffer as having an I/O error and then elog(ERROR). Is there a way

[HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Tom Lane
Postgres has a bad habit of becoming very confused if the page header of a page on disk has become corrupted. In particular, bogus values in the pd_lower field tend to make it look like there are many more tuples than there really are, and of course these tuples contain garbage. That leads to

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Sailesh Krishnamurthy
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom Postgres has a bad habit of becoming very confused if the Tom page header of a page on disk has become corrupted. In Tom particular, bogus values in the pd_lower field tend to make I haven't read this piece of pgsql code very carefully

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Mon, 17 Feb 2003, Tom Lane wrote: Postgres has a bad habit of becoming very confused if the page header of a page on disk has become corrupted. What typically causes this corruption? If it's any kind of a serious problem, maybe it would be worth keeping a CRC of the header at the end of

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Tom Lane
Curt Sampson [EMAIL PROTECTED] writes: On Mon, 17 Feb 2003, Tom Lane wrote: Postgres has a bad habit of becoming very confused if the page header of a page on disk has become corrupted. What typically causes this corruption? Well, I'd like to know that too. I have seen some cases that were

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Mon, 17 Feb 2003, Tom Lane wrote: Curt Sampson [EMAIL PROTECTED] writes: If it's any kind of a serious problem, maybe it would be worth keeping a CRC of the header at the end of the page somewhere. See past discussions about keeping CRCs of page contents. Ultimately I think it's a

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Tom Lane
Curt Sampson [EMAIL PROTECTED] writes: Well, I wasn't proposing the whole page, just the header. That would be significantly cheaper (in fact, there's no real need even for a CRC; probably just xoring all of the words in the header into one word would be fine) and would tell you if the page

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Tue, 18 Feb 2003, Tom Lane wrote: The header is only a dozen or two bytes long, so torn-page syndrome won't result in header corruption. No. But the checksum would detect both header corruption and torn pages. Two for the price of one. But I don't think it's worth changing the page layout

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Bruce Momjian
Tom Lane wrote: Curt Sampson [EMAIL PROTECTED] writes: On Mon, 17 Feb 2003, Tom Lane wrote: Postgres has a bad habit of becoming very confused if the page header of a page on disk has become corrupted. What typically causes this corruption? Well, I'd like to know that too. I have