On 2016-06-09 23:39:24 -0700, Andres Freund wrote:
> On 2016-06-10 11:58:26 +0530, Amit Kapila wrote:
> > I have tried in multiple ways by running pgbench with read-write tests, but
> > could not see any such behaviour.
> 
> It took over an hour of pgbench on a fast laptop till I saw it.
> 
> 
> > I have tried by even crashing and
> > restarting the server and then again running pgbench.  Do you see these
> > records on master or slave?
> 
> Master, but with an existing standby. So it could be related to
> hot_standby_feedback or such.

I just managed to trigger it again.


#1  0x00007fa1a73778da in __GI_abort () at abort.c:89
#2  0x00007f9f1395e59c in record_corrupt_item (items=items@entry=0x2137be0, 
tid=0x7f9fb8681c0c)
    at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
#3  0x00007f9f1395ead5 in collect_corrupt_items (relid=relid@entry=29449, 
all_visible=all_visible@entry=0 '\000', all_frozen=all_frozen@entry=1 '\001')
    at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
#4  0x00007f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at 
/home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
#5  0x00000000005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, 
econtext=0x2168320, argContext=<optimized out>, expectedDesc=0x2168ef0, 
    randomAccess=0 '\000') at 
/home/andres/src/postgresql/src/backend/executor/execQual.c:2211
#6  0x0000000000616992 in FunctionNext (node=node@entry=0x2168210) at 
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
#7  0x00000000005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 <FunctionRecheck>, 
accessMtd=0x616700 <FunctionNext>, node=0x2168210)
    at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
#8  ExecScan (node=node@entry=0x2168210, accessMtd=accessMtd@entry=0x616700 
<FunctionNext>, recheckMtd=recheckMtd@entry=0x6166f0 <FunctionRecheck>)
    at /home/andres/src/postgresql/src/backend/executor/execScan.c:145
#9  0x00000000006169e4 in ExecFunctionScan (node=node@entry=0x2168210) at 
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268

the error happened just after I restarted a standby, so it's not
unlikely to be related to hot_standby_feedback.


(gdb) p *tuple.t_data
$5 = {t_choice = {t_heap = {t_xmin = 9105470, t_xmax = 26049273, t_field3 = 
{t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 9105470, 
      datum_typmod = 26049273, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi 
= 1, bi_lo = 19765}, ip_posid = 3}, t_infomask2 = 4, t_infomask = 770, 
  t_hoff = 24 '\030', t_bits = 0x7f9fb8681c17 ""}

Infomask is:
#define HEAP_XMIN_COMMITTED             0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID               0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN                (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_HASVARWIDTH                0x0002  /* has variable-width 
attribute(s) */

This indeed looks borked.  Such a tuple should never survive
                if (check_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer))
                        check_frozen = false;
especially not when
(gdb) p PageIsAllVisible(page)
$3 = 4

(fwiw, checking PD_ALL_VISIBLE in those functions sounds like a good plan)


I've got another earlier case (that I somehow missed seeing), below
check_visible:

(gdb) p *tuple->t_data 
$2 = {t_choice = {t_heap = {t_xmin = 13616549, t_xmax = 25210801, t_field3 = 
{t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 13616549, 
      datum_typmod = 25210801, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi 
= 0, bi_lo = 52320}, ip_posid = 67}, t_infomask2 = 32772, t_infomask = 8962, 
  t_hoff = 24 '\030', t_bits = 0x7f9fda2f8717 ""}

infomask is:
#define HEAP_UPDATED                    0x2000  /* this is UPDATEd version of 
row */
#define HEAP_XMIN_COMMITTED             0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID               0x0200  /* t_xmin invalid/aborted */
#define HEAP_HASVARWIDTH                0x0002  /* has variable-width 
attribute(s) */
infomask2 is:
#define HEAP_ONLY_TUPLE                 0x8000  /* this is heap-only tuple */

I'll run again, with a debugger attached, maybe I can get some more
information.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to