Looping pgsql-general mail list
I see I am not clear in my question , below are the order of events we see when we get a invalid page header in block corruption -Windows server crashed/restarted due to power failure ( we believe) ( I see that write-cache / write back cache / Disk cache are enabled and we don't have battery backup ) -Posgres DB crashes (LOG: database system was not properly shut down; automatic recovery in progress ) - After around 3hrs after crash we see below messages in log 2016-03-03 20:13:18 GMT ERROR: invalid page header in block 204 of relation base/16413/16900 2016-03-03 20:13:18 GMT CONTEXT: automatic analyze of table " a.parametertable" => this is a relation not Index - select count(*) on table gives invalid page header message , where as select count(*) order by primary key does not give any issue - After clearing the blocks using zero_damaged_pages , total count of table rows using the table and order by primary key matches which means there is no dataloss ( I havent verified each record by record , I assume there is no dataloss) please correct ?? I would like to know what was the cause of invalid page header and is there any way I can reproduce this error ? Appreciate your comments and suggestions on this ? -Sreekanth On Thu, Dec 8, 2016 at 12:55 PM, Sreekanth Palluru <[email protected]> wrote: > > Forgot to loop community > > > Hi Jorge/All, > Thanks for the reply. > As per the documentation , I think backend reads the this page header and > reports that it is damaged. > I am looking at ways we re-create this scenario of creation this blank > corrupted page ? do I have any control over Backend and after it > initializes a new page using function PageInit > <https://doxygen.postgresql.org/bufpage_8c.html#ab871202326b101c6ec24b7f628157c2c> > (Page > <https://doxygen.postgresql.org/bufpage_8h.html#a2010e3258a7075b32ad5750134ab9c5c> > page, Size > <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4> > pageSize, Size > <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4> > specialSize) and I want to halt/crash the backend by stopping postgres > database or through Hardware/system crash ? > Is this possible ? > > Also I see that based on below comments from Source code, having such > pages in database is normal . is my understanding correct ? > > > /* > 63 * PageIsVerified > 64 * Check that the page header and checksum (if any) appear valid. > 65 * > 66 * This is called when a page has just been read in from disk. The > idea is > 67 * to cheaply detect trashed pages before we go nuts following bogus > item > 68 * pointers, testing invalid transaction identifiers, etc. > 69 * > 70 * It turns out to be necessary to allow zeroed pages here too. Even > though > 71 * this routine is *not* called when deliberately adding a page to a > relation, > 72 * there are scenarios in which a zeroed page might be found in a > table. > 73 * (Example: a backend extends a relation, then crashes before it can > write > 74 * any WAL entry about the new page. The kernel will already have the > 75 * zeroed page in the file, and it will stay that way after restart.) > So we > 76 * allow zeroed pages here, and are careful that the page access macros > 77 * treat such a page as empty and without free space. Eventually, > VACUUM > 78 * will clean up such a page and make it usable. > 79 */ > zero_damaged_pages (boolean) > > Detection of a damaged page header normally causes PostgreSQL to report > an error, aborting the current transaction. Setting zero_damaged_pages to > on causes the system to instead report a warning, zero out the damaged page > in memory, and continue processing. This behavior will destroy data, > namely all the rows on the damaged page. However, it does allow you to get > past the error and retrieve rows from any undamaged pages that might be > present in the table. It is useful for recovering data if corruption has > occurred due to a hardware or software error. You should generally not set > this on until you have given up hope of recovering data from the damaged > pages of a table. Zeroed-out pages are not forced to disk so it is > recommended to recreate the table or the index before turning this > parameter off again. The default setting is off, and it can only be > changed by a superuser. > > On Thu, Dec 8, 2016 at 9:50 AM, Jorge Torralba <[email protected]> > wrote: > >> Look into this setting. >> >> zero_damaged_pages = on; >> >> The docs should explain it. >> >> >> >> On Wed, Dec 7, 2016 at 2:47 PM, sreekanth Palluru <[email protected]> >> wrote: >> >>> Hi , >>> >>> I am getting invalid page header error and what I could observe is when >>> I select the table I get this error , where as if I select table and order >>> by primary key I can retrieve the rows from table. >>> >>> And I don't see any dataloss ( based on total number records) after >>> fixing the blocks using zero_damaged_pages=on and then vacuum full on the >>> table >>> >>> >>> Please note I have renamed few tables to avoid giving actual table names >>> >>> We run Postgres 9.2 version on Windows and Dell Optiplex XE2 server and >>> we suspect that block corruption would have happen due to >>> hardware/memory/power failures reasons and I have gone through wikik >>> https://wiki.postgresql.org/wiki/Reliable_Writes. >>> >>> I want to understand why we can read the table through where there >>> exists index and explain plan shows Index scan and with high cost compare >>> to seq scan . >>> I assume that since there no rows/data present in these corrupted blocks >>> index scan skips these blocks and hence it is not throwing the error . >>> >>> Also , I want to know what would have caused the postgres to create >>> these corrupted blocks and >>> can I reproduce this error ? appreciate if you share any pointers to >>> blogs/mailing lists if this type of issue is already discussed ? >>> >>> >>> create table a.parametertable_bak as select * from a.parametertable >>> order by id; >>> >>> labs=# select count(*) from a.parametertable_bak ; >>> count >>> ------- >>> 31415 >>> (1 row) >>> >>> >>> labs=# >>> labs=# checkpoint; >>> CHECKPOINT >>> labs=# set zero_damaged_pages=on; >>> SET >>> labs=# vacuum full a.parametertable; >>> WARNING: invalid page header in block 204 of relation base/16413/16900; >>> zeroing out page >>> WARNING: invalid page header in block 205 of relation base/16413/16900; >>> zeroing out page >>> VACUUM >>> labs=# select count(*) from a.parametertable ; >>> count >>> ------- >>> 31415 >>> (1 row) >>> >>> >>> labs=# >>> -Sreekanth >>> >> >> >> >> -- >> Thanks, >> >> Jorge Torralba >> ---------------------------- >> >> Note: This communication may contain privileged or other confidential >> information. If you are not the intended recipient, please do not print, >> copy, retransmit, disseminate or otherwise use the information. Please >> indicate to the sender that you have received this email in error and >> delete the copy you received. Thank You. >> > > > > -- > Regards > Sreekanth > > > > -- > Regards > Sreekanth > -- Regards Sreekanth
