* Alfred Perlstein <[EMAIL PROTECTED]> [001006 16:02] wrote:
> * Tom Lane <[EMAIL PROTECTED]> [001004 09:56] wrote:
> > Alfred Perlstein <[EMAIL PROTECTED]> writes:
> > > I have a reliable way to make postgresql crash after a
> > > couple of hours over here and a backtrace that looks like a good
> > > catch.
> > 
> > I'm interested in pursuing this, but the backtrace doesn't give enough
> > info to debug it.  It looks like the backend is crashing because of
> > a previously-corrupted tuple, so what we'll need to do is work backwards
> > to find where the data corruption is occurring.
> > 
> > Can you boil down the test sequence to something that could be
> > reproduced by other people?  The most convenient way to work on it
> > would be to see it happen here...
> 
> I just wanted to note on the list that these crashes seem to have
> stopped with the latest 7.0.2-patches (as of 11:30ish PM EST Oct,
> 4th), it's been over 24 hours since the upgrade (previously I
> couldn't go for more than 20 without a crash).
> 
> My only concern is that I didn't notice anything on the cvs list
> that referenced a fix for crashes.
> 
> Well anyhow I'll post an update in a couple of days if all is well
> or not.

Unfortunatly I'm still getting crashes, this one looks like it's
during a vacuum, previously I got a crash while doing an UPDATE, but
in exactly the same spot, it took quite a bit longer to provoke this
time:

-rw-------  1 pgsql  pgsql   277561344 Oct  8 02:56 postgres.core


#0  0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3,
    tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537
537                             off = att_addlength(off, att[i]->attlen, tp + off);
(gdb) bt
#0  0x8063c8b in nocachegetattr (tuple=0xbfbfe974, attnum=3, 
    tupleDesc=0x84ca368, isnull=0xbfbfe7fb "") at heaptuple.c:537
#1  0x8075851 in GetIndexValue (tuple=0xbfbfe974, hTupDesc=0x84ca368, 
    attOff=3, attrNums=0x8508240, fInfo=0x0, attNull=0xbfbfe7fb "")
    at indexam.c:445
#2  0x80903be in FormIndexDatum (numberOfAttributes=4, 
    attributeNumber=0x8508240, heapTuple=0xbfbfe974, heapDescriptor=0x84ca368, 
    datum=0x8508018, nullv=0x84ba170 "    ", fInfo=0x0) at index.c:1256
#3  0x80a05e6 in vc_repair_frag (vacrelstats=0x84ba290, onerel=0x84c6788, 
    vacuum_pages=0xbfbfea1c, fraged_pages=0xbfbfea0c, nindices=1, 
    Irel=0x84ba118) at vacuum.c:1634
#4  0x809e3b9 in vc_vacone (relid=1315147913, analyze=0, va_cols=0x0)
    at vacuum.c:640
#5  0x809d9ac in vc_vacuum (VacRelP=0xbfbfeaac, analyze=0 '\000', va_cols=0x0)
    at vacuum.c:299
#6  0x809d934 in vacuum (vacrel=0x84ba0e8 "\030", verbose=1, analyze=0 '\000', 
    va_spec=0x0) at vacuum.c:223
#7  0x810ca8c in ProcessUtility (parsetree=0x84ba110, dest=Remote)
    at utility.c:694
#8  0x810a44e in pg_exec_query_dest (
    query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;", 
    dest=Remote, aclOverride=0) at postgres.c:617
#9  0x810a3a9 in pg_exec_query (
    query_string=0x81cd370 "VACUUM verbose webhit_details_formatted;")
    at postgres.c:562
#10 0x810b336 in PostgresMain (argc=7, argv=0xbfbff12c, real_argc=10, 
    real_argv=0xbfbffb8c) at postgres.c:1588
#11 0x80f0742 in DoBackend (port=0x8464000) at postmaster.c:2009
#12 0x80f02d5 in BackendStartup (port=0x8464000) at postmaster.c:1776
#13 0x80ef4f9 in ServerLoop () at postmaster.c:1037
#14 0x80eeede in PostmasterMain (argc=10, argv=0xbfbffb8c) at postmaster.c:725
#15 0x80bf3eb in main (argc=10, argv=0xbfbffb8c) at main.c:93
#16 0x8063495 in _start ()
st
532     
533                                     if (usecache)
534                                             att[i]->attcacheoff = off;
535                             }
536     
537                             off = att_addlength(off, att[i]->attlen, tp + off);
538     
539                             if (usecache &&
540                                     att[i]->attlen == -1 && 
!VARLENA_FIXED_SIZE(att[i]))
541                                     usecache = false;

it looks like it's dieing in the same place as the previous coredumps
however this looks like it's during a vacuum rather than an update:

(gdb) print off
$1 = -838833616
(gdb) print att[i]
$2 = 0x84ca640
(gdb) print *(att[i])
$3 = {attrelid = 1315147913, attname = {
    data = "attr_name", '\000' <repeats 22 times>, 
    alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0, 
  attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36, 
  attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print i            
$4 = 2
(gdb) print tp
$5 = 0x5808eba5 "Yj"
(gdb) print tp+off       
$6 = 0x260955d5 <Address 0x260955d5 out of bounds>

ack!

(gdb) print usecache
$7 = 0 '\000'
(gdb) print attnum
$8 = 3
(gdb) print slow
$9 = 139159376
(gdb) print *slow
$10 = 139241024
(gdb) print (char *) tup + tup->t_hoff
$11 = 0x5808eba5 "Yj"
(gdb) print tup
$12 = 0x5808eba0
(gdb) print *tup
$13 = {t_oid = 0, t_cmin = 6969654, t_cmax = 6958161, t_xmin = 1742, 
  t_xmax = 6955895, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 639}, 
    ip_posid = 84}, t_natts = 737, t_infomask = 32846, t_hoff = 5 '\005', 
  t_bits = "\000\002¥ "}
(gdb) print *tupleDesc 
$14 = {natts = 1358981721, attrs = 0xce006a2c, constr = 0x77000006}
(gdb) print *(att[0])
$15 = {attrelid = 1315147913, attname = {
    data = "counter_id", '\000' <repeats 21 times>, 
    alignmentDummy = 1853189987}, atttypid = 23, attdisbursion = 0, 
  attlen = 4, attnum = 1, attnelems = 0, attcacheoff = 0, atttypmod = -1, 
  attbyval = 1 '\001', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[1])
$16 = {attrelid = 1315147913, attname = {
    data = "attr_type", '\000' <repeats 22 times>, 
    alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0, 
  attlen = -1, attnum = 2, attnelems = 0, attcacheoff = 4, atttypmod = 36, 
  attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[2])
$17 = {attrelid = 1315147913, attname = {
    data = "attr_name", '\000' <repeats 22 times>, 
    alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0, 
  attlen = -1, attnum = 3, attnelems = 0, attcacheoff = -1, atttypmod = 36, 
  attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[3])
$18 = {attrelid = 1315147913, attname = {
    data = "attr_vers", '\000' <repeats 22 times>, 
    alignmentDummy = 1920234593}, atttypid = 1043, attdisbursion = 0, 
  attlen = -1, attnum = 4, attnelems = 0, attcacheoff = -1, atttypmod = 36, 
  attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 105 'i', attnotnull = 0 '\000', atthasdef = 0 '\000'}
(gdb) print *(att[4])
$19 = {attrelid = 1315147913, attname = {
    data = "attr_hits", '\000' <repeats 22 times>, 
    alignmentDummy = 1920234593}, atttypid = 20, attdisbursion = 0, 
  attlen = 8, attnum = 5, attnelems = 0, attcacheoff = -1, atttypmod = -1, 
  attbyval = 0 '\000', attstorage = 112 'p', attisset = 0 '\000', 
  attalign = 100 'd', attnotnull = 0 '\000', atthasdef = 1 '\001'}
(gdb) print *tuple
$20 = {t_len = 80, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 640}, 
    ip_posid = 5}, t_datamcxt = 0x0, t_data = 0x5808eba0}



thanks,
-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

Reply via email to