On May 26, 2012, at 9:17 AM, Tom Lane wrote:

> Would you guys please try this in the problem databases:
> 
> select a.ctid, c.relname
> from pg_attribute a join pg_class c on a.attrelid=c.oid
> where c.relnamespace=11 and c.relkind in ('r','i')
> order by 1 desc;
> 
> If you see any block numbers above about 20 then maybe the triggering
> condition is a row relocation after all.


Sorry for such a long delay on the reply.  Took a while to get the data 
directory moved elsewhere:

select a.ctid, c.relname
from pg_attribute a join pg_class c on a.attrelid=c.oid
where c.relnamespace=11 and c.relkind in ('r','i')
order by 1 desc;

  ctid   |                 relname                 
---------+-----------------------------------------
 (18,31) | pg_extension_name_index
 (18,30) | pg_extension_oid_index
 (18,29) | pg_seclabel_object_index
 (18,28) | pg_seclabel_object_index
 (18,27) | pg_seclabel_object_index
 (18,26) | pg_seclabel_object_index

> As the next step, I'd suggest verifying that the stall is reproducible
> if you remove pg_internal.init (and that it's not there otherwise), and
> then strace'ing the single incoming connection to see what it's doing.

It does take a little while, but not nearly as long as the stalls we were 
seeing before.  The pg_internal.init is 108k in case that's an interesting data 
point.

Any other tests you'd like me to run on that bad data dir?

Also, thus far, the newly initdb'd system continues to hum along just fine.  
It's also been upgraded to 9.1.4, so if it was the rebuilding of 
pg_internal.init, then your fix should keep it happy.

Reply via email to