Re: [HACKERS] Detecting corrupted pages earlier

2003-04-04 Thread Tom Lane
Kevin Brown <[EMAIL PROTECTED]> writes: > Shouldn't each variable listed in postgresql.conf.sample have comments > right above it explaining what it does anyway? Not really --- if you can't be bothered to consult the Admin Guide when in doubt, you have no business editing the config file. A word

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-04 Thread Kevin Brown
Andrew Sullivan wrote: > On Thu, Apr 03, 2003 at 02:39:17PM -0500, Tom Lane wrote: > > just not listing zero_damaged_pages in postgresql.conf.sample? We > > already have several variables deliberately not listed there ... > > Hey, that might be a good solution. Of course, it doesn't solve the >

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-03 18:40:54 -0500, Tom Lane wrote: > Vincent van Leeuwen <[EMAIL PROTECTED]> writes: > > ... This cost us about 10 hours downtime. If I'd had the option I just > > would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort > > itself out. > > Yikes. If I understand t

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Tom Lane
Vincent van Leeuwen <[EMAIL PROTECTED]> writes: > ... This cost us about 10 hours downtime. If I'd had the option I just > would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort > itself out. Yikes. If I understand this correctly, you had both critical data and cache data i

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-02 16:18:33 -0500, Tom Lane wrote: > Kevin Brown <[EMAIL PROTECTED]> writes: > > Hmm...I don't know that I'd want to go that far -- setting this > > variable could be regarded as a policy decision. Some shops may have > > very good reason for running with ZERO_DAMAGED_PAGES enabled all

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Andrew Sullivan
On Thu, Apr 03, 2003 at 02:39:17PM -0500, Tom Lane wrote: > just not listing zero_damaged_pages in postgresql.conf.sample? We > already have several variables deliberately not listed there ... Hey, that might be a good solution. Of course, it doesn't solve the "doomsday device" problem, but nobo

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Tom Lane writes: >> Andrew Sullivan expressed concern about this, too. The thing could >> be made a little more failsafe if we made it impossible to set >> ZERO_DAMAGED_PAGES to true in postgresql.conf, or by any means other >> than an actual SET comm

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Peter Eisentraut
Tom Lane writes: > Andrew Sullivan expressed concern about this, too. The thing could > be made a little more failsafe if we made it impossible to set > ZERO_DAMAGED_PAGES to true in postgresql.conf, or by any means other > than an actual SET command --- whose impact would then be limited to > th

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Huh, and what if I accidentaly mistype the number and destroy a valid > page? Maybe the command should only succeed if it confirms that the > page is corrupted. Good point ... but what if the corruption is subtle enough that the automatic tests don't n

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Andrew Sullivan <[EMAIL PROTECTED]> writes: > You know you have big-trouble, oh-no, ISP ran over > the tapes while they were busy pitching magnets through your cage, > data corruption problems, and this is your best hope for recovery? > Great. Log in, turn on this option, and start working. But

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Alvaro Herrera
On Wed, Apr 02, 2003 at 11:10:13PM -0500, Tom Lane wrote: > What I'd really prefer to see is not a ZERO_DAMAGED_PAGES setting, > but an explicit command to "DESTROY PAGE n OF TABLE foo". That would > make you manually admit defeat for each individual page before it'd > drop data. But I don't pre

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Christopher Kings-Lynne
> What I'd really prefer to see is not a ZERO_DAMAGED_PAGES setting, > but an explicit command to "DESTROY PAGE n OF TABLE foo". That would > make you manually admit defeat for each individual page before it'd > drop data. But I don't presently have time to implement such a command > (any volunte

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Bruce Momjian
Tom Lane wrote: > Andrew Sullivan <[EMAIL PROTECTED]> writes: > > You know you have big-trouble, oh-no, ISP ran over > > the tapes while they were busy pitching magnets through your cage, > > data corruption problems, and this is your best hope for recovery? > > Great. Log in, turn on this option

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Andrew Sullivan
On Wed, Apr 02, 2003 at 02:07:02PM -0700, Jason Earl wrote: > There are some cases where this particular feature would be useful. > What needs to be done is to make the feature less dangerous to the > newbie without making it less useful to the person who actually needs > the functionality. I'll

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Kevin Brown <[EMAIL PROTECTED]> writes: > Hmm...I don't know that I'd want to go that far -- setting this > variable could be regarded as a policy decision. Some shops may have > very good reason for running with ZERO_DAMAGED_PAGES enabled all the > time, but I don't know what those reasons might

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Jason Earl
Andrew Sullivan <[EMAIL PROTECTED]> writes: > On Wed, Apr 02, 2003 at 03:25:58PM -0500, Tom Lane wrote: > > > the current session. This is kind of an ugly wart on the GUC mechanism, > > but I think not difficult to do with an assign_hook (it just has to > > refuse non-interactive settings). > >

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Kevin Brown
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Perhaps better would be to throw a message message any time it is turned > > on, reminding them it should not be left on. Is that cleaner? > > Where are you going to throw a message to, if it's in postgresql.conf? > > Bleating in th

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Kevin Brown
Tom Lane wrote: > Kevin Brown <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Basically, one should only turn this variable on after giving up on the > >> possibility of getting any data out of the broken page itself. It would > >> be folly to run with it turned on as a normal setting. > > >

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Andrew Sullivan
On Wed, Apr 02, 2003 at 03:25:58PM -0500, Tom Lane wrote: > the current session. This is kind of an ugly wart on the GUC mechanism, > but I think not difficult to do with an assign_hook (it just has to > refuse non-interactive settings). It may be an ugly wart, but I think it's only prudent. I'

Re: [HACKERS] Detecting corrupted pages earlier

2003-04-02 Thread Tom Lane
Kevin Brown <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Basically, one should only turn this variable on after giving up on the >> possibility of getting any data out of the broken page itself. It would >> be folly to run with it turned on as a normal setting. > This statement should *defini

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-31 Thread Kevin Brown
Tom Lane wrote: > Here's what I put in --- feel free to suggest better wording. > > ZERO_DAMAGED_PAGES (boolean) > > Detection of a damaged page header normally causes PostgreSQL to > report an error, aborting the current transaction. Setting > zero_damaged_pages to true causes the

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-30 Thread Tom Lane
Kevin Brown <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Basically, one should only turn this variable on after giving up on the >> possibility of getting any data out of the broken page itself. It would >> be folly to run with it turned on as a normal setting. > This statement should *defini

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-29 Thread Kevin Brown
Tom Lane wrote: > Kris Jurka <[EMAIL PROTECTED]> writes: > > Is zeroing the pages the only / best option? > > It's the only way to avoid a core dump when the system tries to process > the page. And no, I don't want to propagate the notion that "this page > is broken" beyond the buffer manager, so

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Tom Lane
Kris Jurka <[EMAIL PROTECTED]> writes: > Is zeroing the pages the only / best option? It's the only way to avoid a core dump when the system tries to process the page. And no, I don't want to propagate the notion that "this page is broken" beyond the buffer manager, so testing elsewhere isn't an

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Kris Jurka
On Fri, 28 Mar 2003, Tom Lane wrote: > Hiroshi Inoue <[EMAIL PROTECTED]> writes: > > How about adding a new option to skip corrupted pages ? > > I have committed changes to implement checking for damaged page headers, > along the lines of last month's discussion. It includes a GUC variable > to

Re: [HACKERS] Detecting corrupted pages earlier

2003-03-28 Thread Tom Lane
Hiroshi Inoue <[EMAIL PROTECTED]> writes: > How about adding a new option to skip corrupted pages ? I have committed changes to implement checking for damaged page headers, along the lines of last month's discussion. It includes a GUC variable to control the response as suggested by Hiroshi. Giv

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-19 Thread Hiroshi Inoue
Tom Lane wrote: > > Hiroshi Inoue <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Hiroshi Inoue <[EMAIL PROTECTED]> writes: > >>> Is there a way to make our way around the pages ? > >> > >> If the header is corrupt, I don't think so. > > > What I asked is how to re

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Tom Lane
Hiroshi Inoue <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Hiroshi Inoue <[EMAIL PROTECTED]> writes: >>> Is there a way to make our way around the pages ? >> >> If the header is corrupt, I don't think so. > What I asked is how to read all other sane pages. Oh, I see. You can do "SELECT ...

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Hiroshi Inoue
Tom Lane wrote: > > Hiroshi Inoue <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> I'm thinking of modifying ReadBuffer() so that it errors out if the > > > What does the *error out* mean ? > > Mark the buffer as having an I/O error and then elog(ERROR). > > >

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Tom Lane
Hiroshi Inoue <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I'm thinking of modifying ReadBuffer() so that it errors out if the > What does the *error out* mean ? Mark the buffer as having an I/O error and then elog(ERROR). > Is there a way to make our way around the pages ? If the header is

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Hiroshi Inoue
Tom Lane wrote: > > Postgres has a bad habit of becoming very confused if the page header of > a page on disk has become corrupted. In particular, bogus values in the > pd_lower field tend to make it look like there are many more tuples than > there really are, and of course these

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Greg Copeland
On Mon, 2003-02-17 at 22:04, Tom Lane wrote: > Curt Sampson <[EMAIL PROTECTED]> writes: > > On Mon, 17 Feb 2003, Tom Lane wrote: > >> Postgres has a bad habit of becoming very confused if the page header of > >> a page on disk has become corrupted. > > > What typically causes this corruption? > >

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Hannu Krosing
Tom Lane kirjutas T, 18.02.2003 kell 17:21: > Kevin Brown <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> The cases I've been able to study look like the header and a lot of the > >> following page data have been overwritten with garbage --- when it made > >> any sense at all, it looked like t

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Tom Lane
Kevin Brown <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The cases I've been able to study look like the header and a lot of the >> following page data have been overwritten with garbage --- when it made >> any sense at all, it looked like the contents of non-Postgres files (eg, >> plain text),

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-18 Thread Kevin Brown
Tom Lane wrote: > The cases I've been able to study look like the header and a lot of the > following page data have been overwritten with garbage --- when it made > any sense at all, it looked like the contents of non-Postgres files (eg, > plain text), which is why I mentioned the possibility of d

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Tue, 18 Feb 2003, Tom Lane wrote: > The header is only a dozen or two bytes long, so torn-page syndrome > won't result in header corruption. No. But the checksum would detect both header corruption and torn pages. Two for the price of one. But I don't think it's worth changing the page layout

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Tom Lane
Curt Sampson <[EMAIL PROTECTED]> writes: > Well, I wasn't proposing the whole page, just the header. That would be > significantly cheaper (in fact, there's no real need even for a CRC; > probably just xoring all of the words in the header into one word would > be fine) and would tell you if the pa

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Mon, 17 Feb 2003, Tom Lane wrote: > Curt Sampson <[EMAIL PROTECTED]> writes: > > > If it's any kind of a serious problem, maybe it would be worth keeping > > a CRC of the header at the end of the page somewhere. > > See past discussions about keeping CRCs of page contents. Ultimately > I think

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Bruce Momjian
Tom Lane wrote: > Curt Sampson <[EMAIL PROTECTED]> writes: > > On Mon, 17 Feb 2003, Tom Lane wrote: > >> Postgres has a bad habit of becoming very confused if the page header of > >> a page on disk has become corrupted. > > > What typically causes this corruption? > > Well, I'd like to know that

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Tom Lane
Curt Sampson <[EMAIL PROTECTED]> writes: > On Mon, 17 Feb 2003, Tom Lane wrote: >> Postgres has a bad habit of becoming very confused if the page header of >> a page on disk has become corrupted. > What typically causes this corruption? Well, I'd like to know that too. I have seen some cases tha

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Curt Sampson
On Mon, 17 Feb 2003, Tom Lane wrote: > Postgres has a bad habit of becoming very confused if the page header of > a page on disk has become corrupted. What typically causes this corruption? If it's any kind of a serious problem, maybe it would be worth keeping a CRC of the header at the end of t

Re: [HACKERS] Detecting corrupted pages earlier

2003-02-17 Thread Sailesh Krishnamurthy
> "Tom" == Tom Lane <[EMAIL PROTECTED]> writes: Tom> Postgres has a bad habit of becoming very confused if the Tom> page header of a page on disk has become corrupted. In Tom> particular, bogus values in the pd_lower field tend to make I haven't read this piece of pgsql code very