Re: [HACKERS] 8.1 index corruption woes

2008-07-14 Thread Decibel!

On Jul 11, 2008, at 2:17 PM, Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:

I don't want to discard this idea, because we're getting a very
unusually high number of bogus entries.  However, they are all (or a
very high percentage of them) the very first entry on each index  
page.

I want to confirm that the leftmost on a leaf btree page is a valid
item, and not something like the lower bound value?  (I think we only
store high bounds on internal pages, not leaf pages, but I'm not  
sure).


Er ... no.  Per nbtree/README:

: On a page that is not rightmost in its tree level, the high key is
: kept in the page's first item, and real data items start at item 2.
: The link portion of the high key item goes unused.  A page that is
: rightmost has no high key, so data items start with the first  
item.
: Putting the high key at the left, rather than the right, may seem  
odd,

: but it avoids moving the high key as we add data items.

(Right offhand, it looks like _bt_split just copies the item that it's
cloning the index key of.  Maybe it would be worth setting the item
pointer invalid, to prevent confusion in future?)


Another option would be to change pg_filedump to understand what the  
first item in the index is.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] 8.1 index corruption woes

2008-07-11 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
   We've detected what I think is some sort of index corruption in 8.1.
   The server is running 8.1.11, so AFAICT the problem with truncated pages
   in vacuum is already patched and accounted for (i.e. we reindexed, and a
   bit later the problem presented itself again).  There haven't been any
   relevant fixes after that AFAICT.
  
   What we see is that after a bit of updating the index, it starts having
   tuples that poing to heap entries which are marked unused.
  
  Do you actually see any observed problem, or is this conclusion based
  entirely on your pg_filedump analysis?
 
 Well, yeah, this all started because the guys started getting weird
 results in queries, and found out that disabling index scans returned
 different results.

I neglected to mention that further analysis of the failed index scans
showed that index entries were pointing to heap tuples with completely
different data.

I was tracking one of those, when I noticed that in the same index page
was an entry pointing to a heap entry marked as unused.

  I suspect a problem with your analysis script, although a quick scan
  of the code didn't find an issue.

I don't want to discard this idea, because we're getting a very
unusually high number of bogus entries.  However, they are all (or a
very high percentage of them) the very first entry on each index page.
I want to confirm that the leftmost on a leaf btree page is a valid
item, and not something like the lower bound value?  (I think we only
store high bounds on internal pages, not leaf pages, but I'm not sure).


  Another point to keep in mind, if you are trying to analyze files
  belonging to a live database, is that what you can see in the filesystem
  may not be the current contents of every page.  For typical access
  patterns it'd be unsurprising for the visible index pages to lag behind
  those of the heap, since they'd be hotter and tend to stay in shared
  buffers longer.

It was confirmed that the servers are not live, and a checkpoint has
been executed by the recovery code.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.1 index corruption woes

2008-07-11 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I don't want to discard this idea, because we're getting a very
 unusually high number of bogus entries.  However, they are all (or a
 very high percentage of them) the very first entry on each index page.
 I want to confirm that the leftmost on a leaf btree page is a valid
 item, and not something like the lower bound value?  (I think we only
 store high bounds on internal pages, not leaf pages, but I'm not sure).

Er ... no.  Per nbtree/README:

: On a page that is not rightmost in its tree level, the high key is
: kept in the page's first item, and real data items start at item 2.
: The link portion of the high key item goes unused.  A page that is
: rightmost has no high key, so data items start with the first item.
: Putting the high key at the left, rather than the right, may seem odd,
: but it avoids moving the high key as we add data items.

(Right offhand, it looks like _bt_split just copies the item that it's
cloning the index key of.  Maybe it would be worth setting the item
pointer invalid, to prevent confusion in future?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.1 index corruption woes

2008-07-08 Thread Decibel!

On Jul 7, 2008, at 7:39 PM, Tom Lane wrote:

Another point to keep in mind, if you are trying to analyze files
belonging to a live database, is that what you can see in the  
filesystem

may not be the current contents of every page.  For typical access
patterns it'd be unsurprising for the visible index pages to lag  
behind

those of the heap, since they'd be hotter and tend to stay in shared
buffers longer.



None of the tests were done on the production database. Most of the  
checks were not done on a PITR restore; they were done on a SAN-level  
snapshot that had been run through the recovery process (startup  
postmaster on snapshot, let it recover, shut down).


I hadn't thought about checkpointing; I'll make sure to do that next  
time we take a snapshot.


We also analyzed a single table from a completely different (much  
larger) database. In that case the analysis was done on a PITR- 
recovered slave that was up and running, but nothing should have  
been writing to the table at all, and it would have been up long  
enough that it would have checkpointed after exiting PITR recovery  
(though IIRC there's a manual checkpoint done at exit of PITR  
recovery anyway). That check didn't show as many questionable index  
pointers, but there were some (again, the bulk of them were index  
pointers that were using the first line pointer slot in the index page).

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] 8.1 index corruption woes

2008-07-07 Thread Alvaro Herrera
Hi,

We've detected what I think is some sort of index corruption in 8.1.
The server is running 8.1.11, so AFAICT the problem with truncated pages
in vacuum is already patched and accounted for (i.e. we reindexed, and a
bit later the problem presented itself again).  There haven't been any
relevant fixes after that AFAICT.

What we see is that after a bit of updating the index, it starts having
tuples that poing to heap entries which are marked unused.

I detected one of these almost by accident, and then built a tool to
discover them by examining pg_filedump output.  Basically what it does
is scan the heap, note which heap tuples are marked unused, and then
scan the index and for each index tuple in leaf pages, see if it points
to an unused heap tuple.  The number of occurrences is amazingly high.
Right after a reindex there isn't any occurrence; but after a while of
application load, a lot of them appear.

I catalogued this as index corruption: the theory is that as soon as the
unused heap tuple is reused, the index will have a pointer with the
wrong index key pointing to a live heap tuple.

(We saw an occurrence of this too, but I wasn't motivated enough to
write a tool to verify the data in index vs. heap tuples.)

However, seeing the high prevalence of the problem, I started
considering whether the tool is misreading the output anyway -- i.e.
maybe it's a bug in the tool, or a misconception on my part.

The tool output looks like this:

INDEX PTR TO UNUSED HEAP: index 273375 (78,18) - (5530, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (96,84) - (5436, 3)
INDEX PTR TO UNUSED HEAP: index 273375 (111,1) - (1317, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (145,1) - (1665, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (174,1) - (2656, 17)
INDEX PTR TO UNUSED HEAP: index 273375 (199,1) - (1745, 21)
INDEX PTR TO UNUSED HEAP: index 273375 (207,1) - (358, 26)
INDEX PTR TO UNUSED HEAP: index 273375 (214,1) - (2800, 17)
(many more lines of the same stuff)

What this means (first line) is that on index 273375, page 78, offset
18, there's a pointer to heap page 5530, offset 17; but that heap offset
shows up as unused.  This is from the heap:

Block 5530 
Header -
 Block Offset: 0x02b34000 Offsets: Lower 136 (0x0088)
 Block: Size 8192  Version3Upper 992 (0x03e0)
 LSN:  logid130 recoff 0xd53b3090  Special  8192 (0x2000)
 Items:   29   Free Space:  856
 Length (including item array): 140

[ ... snip other items ...]

 Item  17 -- Length:0  Offset:  504 (0x01f8)  Flags: 0x00



This is from the index:

Block   78 
Header -
 Block Offset: 0x0009c000 Offsets: Lower 788 (0x0314)
 Block: Size 8192  Version3Upper5104 (0x13f0)
 LSN:  logid131 recoff 0x01b3c6e0  Special  8176 (0x1ff0)
 Items:  192   Free Space: 4316
 Length (including item array): 792

[ ... snip other items ... ]

 Item  18 -- Length:   16  Offset: 5184 (0x1440)  Flags: USED
  Block Id: 5530  linp Index: 17  Size: 16
  Has Nulls: 0  Has Varwidths: 0


Of course, we're very concerned about this problem.  Personally I am
prepared to believe that this could be a hardware problem, because no
one else seems to be reporting this kind of thing, but I don't see how
it could cause this particular issue and not more wide-ranging data
corruption.

I would like comments on whether this is really a problem or I am just
misreading pg_filedump output.  If we confirm this, we can try to
proceed to investigate it further.

I attach the checker tool in case it is of any interest.

Thanks,

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
#!/usr/bin/perl

use warnings;
use strict;

my $PG_FILEDUMP = pg_filedump-8.1;

process_table();
exit();

sub process_table {
	my ($table, @indexes) = read_toc();

	foreach my $elem ($table, @indexes) {
		die file $elem does not exist unless -f $elem;
		if (!-f $elem.dump) {
			print STDERR generating pg_filedump for $elem\n;
			`$PG_FILEDUMP -i $elem  $elem.dump`;
		}
	}

	print STDERR loading unused line pointers for table $table\n;
	my $unused = get_heap_unused($table);

	foreach my $index (@indexes) {
		print STDERR processing index $index\n;
		process_index($unused, $index);
	}
}

# Reads a toc file, which is a description of a table and its indexes.  A
# table line is table: xxx where xxx is the relfilenode of the table, and an
# index line is index: xxx where xxx is the relfilenode of the index.
sub read_toc {
	my $table;
	my @indexes;

	print STDERR now expecting a TOC in stdin...\n;
	while () {
		if (/^table: ([0-9]+)$/) {
			$table = $1;
		}
		if (/^index: ([0-9]+)$/) {
			push @indexes, $1;
		}
	}
	print STDERR finished reading the TOC for table $table\n;
	print STDERR (indexes: , join(, , @indexes), 

Re: [HACKERS] 8.1 index corruption woes

2008-07-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 We've detected what I think is some sort of index corruption in 8.1.
 The server is running 8.1.11, so AFAICT the problem with truncated pages
 in vacuum is already patched and accounted for (i.e. we reindexed, and a
 bit later the problem presented itself again).  There haven't been any
 relevant fixes after that AFAICT.

 What we see is that after a bit of updating the index, it starts having
 tuples that poing to heap entries which are marked unused.

Do you actually see any observed problem, or is this conclusion based
entirely on your pg_filedump analysis?

I suspect a problem with your analysis script, although a quick scan
of the code didn't find an issue.

Another point to keep in mind, if you are trying to analyze files
belonging to a live database, is that what you can see in the filesystem
may not be the current contents of every page.  For typical access
patterns it'd be unsurprising for the visible index pages to lag behind
those of the heap, since they'd be hotter and tend to stay in shared
buffers longer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.1 index corruption woes

2008-07-07 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  We've detected what I think is some sort of index corruption in 8.1.
  The server is running 8.1.11, so AFAICT the problem with truncated pages
  in vacuum is already patched and accounted for (i.e. we reindexed, and a
  bit later the problem presented itself again).  There haven't been any
  relevant fixes after that AFAICT.
 
  What we see is that after a bit of updating the index, it starts having
  tuples that poing to heap entries which are marked unused.
 
 Do you actually see any observed problem, or is this conclusion based
 entirely on your pg_filedump analysis?

Well, yeah, this all started because the guys started getting weird
results in queries, and found out that disabling index scans returned
different results.

 I suspect a problem with your analysis script, although a quick scan
 of the code didn't find an issue.

Well, that's why I posted the corresponding pg_filedump output -- manual
examination shows of that shows that the script is drawing the correct
conclusions.

 Another point to keep in mind, if you are trying to analyze files
 belonging to a live database, is that what you can see in the filesystem
 may not be the current contents of every page.  For typical access
 patterns it'd be unsurprising for the visible index pages to lag behind
 those of the heap, since they'd be hotter and tend to stay in shared
 buffers longer.

Hmm, I think the files come from a PITR slave that's not online.  I'll
ask to be sure.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.1 index corruption woes

2008-07-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Another point to keep in mind, if you are trying to analyze files
 belonging to a live database, is that what you can see in the filesystem
 may not be the current contents of every page.  For typical access
 patterns it'd be unsurprising for the visible index pages to lag behind
 those of the heap, since they'd be hotter and tend to stay in shared
 buffers longer.

 Hmm, I think the files come from a PITR slave that's not online.  I'll
 ask to be sure.

8.1 didn't have restartpoint code, and of course bgwriter isn't running;
so I believe filesystem pages on a PITR slave could be arbitrarily far
out of date if the corresponding shared buffer got touched regularly.
Try doing the analysis on the master immediately after a CHECKPOINT
command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.1 index corruption woes

2008-07-07 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Another point to keep in mind, if you are trying to analyze files
  belonging to a live database, is that what you can see in the filesystem
  may not be the current contents of every page.  For typical access
  patterns it'd be unsurprising for the visible index pages to lag behind
  those of the heap, since they'd be hotter and tend to stay in shared
  buffers longer.
 
  Hmm, I think the files come from a PITR slave that's not online.  I'll
  ask to be sure.
 
 8.1 didn't have restartpoint code, and of course bgwriter isn't running;
 so I believe filesystem pages on a PITR slave could be arbitrarily far
 out of date if the corresponding shared buffer got touched regularly.
 Try doing the analysis on the master immediately after a CHECKPOINT
 command.

Okay.

I'll also check out tomorrow the LSN on the involved pages to see if
there's anything obviously bogus.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers