Re: [HACKERS] Btree or not btree? That is the question

2012-07-10 Thread Greg Sabino Mullane
On Mon, Jul 09, 2012 at 04:02:13PM -0400, Tom Lane wrote:
 ... Could you crank up the log verbosity so we can get 
 file and line number, at least?

Here is what the increased verbosity reveals in aggregate. This 
is about an 18-hour span, covering 12.5M transactions, on 
version 8.3.18:

(13 times) Various OIDs that do exist cannot be found:
ERROR:  XX000: could not open relation with OID 1554847444
LOCATION:  relation_open, heapam.c:879

(21 times) Relations that do exist cannot be found:
ERROR:  42P01: relation foobar does not exist
LOCATION:  RangeVarGetRelid, namespace.c:273

(1 time) Qualified relation that exists cannot be found:
ERROR:  42P01: relation public.foobar does not exist
LOCATION:  RangeVarGetRelid, namespace.c:268

(5 times) Failure to read a block:
XX001: could not read block 3 of relation 1663/1554846571/4184054438: read only 
0 of 8192 bytes
LOCATION:  mdread, md.c:631

(5 times) Cache lookup failure:
XX000: cache lookup failed for relation 1554847255
LOCATION:  has_subclass, plancat.c:921

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpBH51gd5sgI.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-07-09 Thread Greg Sabino Mullane
I've expanded my searching a bit, to see if I can find any other 
correlations. One thing that seems to happen about 10 times a day 
is an error of this sort:

ERROR:  could not open relation with OID 1554847326

In this case, the OID in question always exists, and corresponds to 
one of a handful of particularly busy tables. Sometimes the query 
does not even touch the OID mentioned directly: in the above example, 
the SQL was an update to table A that had a FK to table B, and the 
OID above is for table B. The queries themselves vary: I've not found any 
common factor yet.

These errors have been happening a long time, and obviously don't cause the 
same database-hosed-must-restart issue the btree does, but it is still 
a little disconcerting. Although 10 times out of  20 million transactions 
per day is at least an extremely rare event :) It is definitely NOT correlated 
to 
system table reindexing, but does seem to be roughly correlated to how busy 
things are in general. We've not been able to duplicate on a non-prod test 
system yet either, which points to either hardware or (more likely) a failure 
to completely simulate the high activity level of prod.

No idea if this related to the relatively recent btree errors, but figured 
I would get it out there. There is also an even rarer sprinkling of:

ERROR:  relation with OID 3924107573 does not exist

but I figured that was probably a variant of the first error.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpmh4I30bBvW.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-07-09 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 I've expanded my searching a bit, to see if I can find any other 
 correlations. One thing that seems to happen about 10 times a day 
 is an error of this sort:

 ERROR:  could not open relation with OID 1554847326

Is that the *entire* message?  No details?  Could you crank up the
log verbosity so we can get file and line number, at least?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree or not btree? That is the question

2012-07-09 Thread Greg Sabino Mullane
  ERROR:  could not open relation with OID 1554847326
 
 Is that the *entire* message?  No details?  Could you crank up the
 log verbosity so we can get file and line number, at least?

Yes, that's the entire thing, other than the statement line after it. 
In this particular case:

STATEMENT: UPDATE mytable SET foobar = 'T' WHERE id IN ($1,$2,$3,$4)

I'll start the wheels rolling for upping the verbosity.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgp1V3duXPAQS.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-07-09 Thread Greg Sabino Mullane
 Could you crank up the log verbosity so we can get 
 file and line number, at least?

First hit since the change:

ERROR:  XX000: could not open relation with OID 1554847444
LOCATION:  relation_open, heapam.c:879
STATEMENT:  SELECT
...

Will leave the verbosity up and see if it occurs in the same place.


-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpSmpP1G2bZT.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Greg Sabino Mullane
  ERROR:  index pg_class_oid_index is not a btree
 
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

This happened again, and this time I went back through 
the logs and found that it is always the exact same query causing 
the issue. I also found it occuring on different servers, 
which rules out RAM anyway (still shared disk, so those are suspect).
This query also sometimes gives errors like this:

ERROR:  could not read block 3 of relation 1663/1554846571/3925298284: 
  read only 0 of 8192 bytes

However, the final number changes: these are invariably temporary relations. 
The query itself is a GROUP BY over a large view and the explain plan is 
107 rows, with nothing esoteric about it. Most of the tables used are 
fairly common ones. I'm trying to duplicate on a non-production box, without 
success so far, and I'm loath to run it on production as it sometimes 
causes multiple backends to freeze up and requires a forceful restart.

Any ideas on how to carefully debug this? There are a couple of quicksorts 
when I explain analyze on a non-prod system, which I am guessing where 
the temp tables come from (work_mem is 24MB). I'm not sure I understand 
what could be causing both the 'read 0' and btree errors for the 
same query - bad blocks on disk for one of the underlying tables?
I'll work next on checking each of the tables the view is using.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpCBcRgxlYYF.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 ERROR:  index pg_class_oid_index is not a btree
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

 This happened again, and this time I went back through 
 the logs and found that it is always the exact same query causing 
 the issue. I also found it occuring on different servers, 
 which rules out RAM anyway (still shared disk, so those are suspect).
 This query also sometimes gives errors like this:
 ERROR:  could not read block 3 of relation 1663/1554846571/3925298284: 
   read only 0 of 8192 bytes
 However, the final number changes: these are invariably temporary relations. 

Oh really ... okay, maybe it is a software problem then.

 Any ideas on how to carefully debug this? There are a couple of quicksorts 
 when I explain analyze on a non-prod system, which I am guessing where 
 the temp tables come from (work_mem is 24MB).

No, that error message is complaining about an attempt to read an
actual, named, relation (could be temp, no way to be sure from this
info).  A sort might generate temp files but those don't have that
kind of name.

 I'm not sure I understand 
 what could be causing both the 'read 0' and btree errors for the 
 same query - bad blocks on disk for one of the underlying tables?

The could not read thing looks more like an attempt to fetch an
invalid TID; you could get such an error for instance if you had an
index that referenced a tuple in block 3, but the table on disk isn't
that long.  So one possible theory for what's happening here is that
once in a while we get confused about which shared buffer holds which
disk block, and either find the wrong block entirely when looking for
pg_class_oid_index (the first case) or grab a page of the wrong index
in the second case.  This theory would be more plausible if you're
wrong about the second-case tables being temp, though, because if they
are temp then their indexes would be kept in local buffers not shared
buffers, making it much harder to believe in a single bug causing both
symptoms.

One possible mechanism for confusion of that sort would be if the
spinlock code wasn't quite right, or the compiler was incorrectly
moving loads/stores into or out of locked sections.  So it might be
time to ask exactly what kind of hardware this is, which compiler
PG was built with, etc.

On the other hand, if the issue were of that sort then it ought to
affect all buffers more or less at random; so if you're consistently
seeing exactly these symptoms (in particular, if it's always
pg_class_oid_index that's complained of), then I'm not sure I believe
this theory either.

Which PG version again?  Are you in the habit of doing VACUUM FULLs
on system catalogs, and if so do these glitches correlate at all
with such activities?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Btree or not btree? That is the question

2012-06-21 Thread Greg Sabino Mullane
I dug through the logs and found some other occurances of 
the could not read block errors. Some on dirt simple 
SELECT queries. Nothing else has generated the btree 
error yet. About 35 found in the last month.

 This theory would be more plausible if you're wrong about the second-case 
 tables being temp, though, because if they are temp then their indexes 
 would be kept in local buffers not shared buffers, making it much harder 
 to believe in a single bug causing both symptoms.

I grepped the last month of logs and found about 20 instances of 
that error: none of the relfilenodes given shows up in pg_class, 
even for that dirt simple SELECT.

 One possible mechanism for confusion of that sort would be if the
 spinlock code wasn't quite right, or the compiler was incorrectly
 moving loads/stores into or out of locked sections.  So it might be
 time to ask exactly what kind of hardware this is, which compiler
 PG was built with, etc.

Quad core AMD Opteron. RHEL. Compiled with gcc with all the options 
(basically the standard compilation e.g. --build=x86_64-redhat-linux-gnu)
I can give you more details offlist if it will help.

 On the other hand, if the issue were of that sort then it ought to
 affect all buffers more or less at random; so if you're consistently
 seeing exactly these symptoms (in particular, if it's always
 pg_class_oid_index that's complained of), then I'm not sure I believe
 this theory either.

I've never seen any other index for the btree error, but it has only 
happened a grand total of 3 times ever. The other error appears to 
be fairly random, except that the one particular query that gives 
the btree error always seems to give one version or the other.

 Which PG version again?  Are you in the habit of doing VACUUM FULLs
 on system catalogs, and if so do these glitches correlate at all
 with such activities?

Heh. 8.3.18. Yes, very heavy vac fulls (and reindexes) of the system 
catalogs. Cron-driven, and depends on the time of day and if any 
DDL is running (if so, it does not run), but probably on average 
pg_class is vacfulled and reindexed twice an hour during the times 
this happens (which is, during normal business hours). There is a lot 
in churn in pg_class, pg_attribute, and pg_depend in particular from 
all the temp stuff being created and torn down all day, as well as 
some Bucardo pg_class updating.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpLQFTbOf8Tw.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-06-06 Thread Greg Sabino Mullane
On Mon, Jun 04, 2012 at 02:09:44PM -0400, Tom Lane wrote:
 Greg Sabino Mullane g...@endpoint.com writes:
  We have a 8.3.18 system (yes, the same one from the previous 
  thread, finally upgraded!) that gave us this error yesterday:
 
  ERROR:  index pg_class_oid_index is not a btree
 
 That means you got bogus data while reading the metapage.
 I'm beginning to wonder about the hardware on this server ...

Thanks for the reply. Me too. This is on a cluster[1], so we 
do have the luxury of testing the hardware on each box and 
shuffling things around. I'm also thinking we need to get away 
from the near-constant updating of pg_class, just as a good 
measure.

[1] Bruce, yet another usage! :)

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpbXwlXRWU0v.pgp
Description: PGP signature


Re: [HACKERS] Btree or not btree? That is the question

2012-06-04 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 We have a 8.3.18 system (yes, the same one from the previous 
 thread, finally upgraded!) that gave us this error yesterday:

 ERROR:  index pg_class_oid_index is not a btree

That means you got bogus data while reading the metapage.
I'm beginning to wonder about the hardware on this server ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers