Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-28 Thread Greg Stark
On 28 Feb 2014 06:19, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-02-27 23:41:08 +, Greg Stark wrote:
  Though I notice something I can't understand here.
 
  After activating the new clone subsequent attempts to select rows from
  the page bump the LSN, presumably due to touching hint bits (since the
  prune xid hasn't changed). But the checksum hasn't changed even after
  running CHECKPOINT.

 Are you running with full_page_writes=off?

No

 Only delete and update do a PageSetPrunable(), so prune_xid not being
 changed doesn't say much...

  How is it possible for the LSN to get updated without changing the
checksum?

 Generally the LSN is computed when writing, not when a buffer is
 modified, so that's not particularly surprising. It'd be interesting to
 see what the records are that end on those LSNs.

The checksum you mean? But that's why I ran checkpoint.

 It'd probably nice to add the capability to dump records that end in a
 particular location to pg_xlogdump...

I have this crazy idea about combining xlogdump and pg_receivexlog to
archive all xlog to a postgres database for querying.


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-28 Thread Andres Freund
On 2014-02-28 10:44:14 +, Greg Stark wrote:
 On 28 Feb 2014 06:19, Andres Freund and...@2ndquadrant.com wrote:
  Generally the LSN is computed when writing, not when a buffer is
  modified, so that's not particularly surprising. It'd be interesting to
  see what the records are that end on those LSNs.
 
 The checksum you mean? But that's why I ran checkpoint.

The checksum is computed in a copy of the buffer, not in the
original. C.f. PageSetChecksumCopy().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-27 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-02-26 18:18:05 -0300, Alvaro Herrera wrote:
  Andres Freund wrote:
  
   static void
   heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
   {
   ...
   HeapTupleHeaderClearHotUpdated(htup);
   HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
   HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
   /* Make sure there is no forward chain link in t_ctid */
   htup-t_ctid = xlrec-target.tid;
   ...
   }
  
  I think the fix is to reset HOT_UPDATED and t_ctid only if the infomask
  says the tuple is LOCKED_ONLY, per the attached patch.
 
 Looks good to me.

Thanks, pushed.

Greg, Peter, if you could update your standbys to the current HEAD of
REL9_3_STABLE for the affected apps and verify the problem no longer
shows up in a reasonable timeframe, it would be great.  (I'm assuming
you saw this happen repeatedly.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-27 Thread Greg Stark
On Thu, Feb 27, 2014 at 2:34 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Greg, Peter, if you could update your standbys to the current HEAD of
 REL9_3_STABLE for the affected apps and verify the problem no longer
 shows up in a reasonable timeframe, it would be great.  (I'm assuming
 you saw this happen repeatedly.)

Actually I can do better. I restored the same base backup and applied
the same period of logs and can confirm that the rows in question now
appear normally:

=# select * from (select
(heap_page_items(get_raw_page('users',13065))).*) as x where lp
in(2,3,4,7);

 lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |
t_ctid   | t_infomask2 | t_infomask | t_hoff |
++--++-+-+--+---+-+++---
  2 |   3424 |1 |232 | 5943845 |  728896 |0 |
(13065,3) |   16416 |   4419 | 32 |
  3 |   3152 |1 |272 | 5943849 | 5943879 |0 |
(13065,4) |   49184 |   9475 | 32 |
  4 |   2864 |1 |287 | 5943879 | 5943880 |0 |
(13065,7) |   49184 |   9475 | 32 |
  7 |   2576 |1 |287 | 5943880 |   0 |0 |
(13065,7) |   32800 |  10499 | 32 |



-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-27 Thread Greg Stark
Though I notice something I can't understand here.

After activating the new clone subsequent attempts to select rows from
the page bump the LSN, presumably due to touching hint bits (since the
prune xid hasn't changed). But the checksum hasn't changed even after
running CHECKPOINT.

How is it possible for the LSN to get updated without changing the checksum?

postgres=# select (page_header(get_raw_page('users',13065))).* ;
 lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-+--+---+---+---+-+--+-+---
 FD/330EC998 |   -25547 | 1 |   152 |  2576 |8192 | 8192 |
  4 |   5638282
(1 row)

postgres=# select (page_header(get_raw_page('users',13065))).* ;
 lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-+--+---+---+---+-+--+-+---
 FD/33140160 |   -25547 | 1 |   152 |  2576 |8192 | 8192 |
  4 |   5638282
(1 row)

postgres=# select (page_header(get_raw_page('users',13065))).* ;
 lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
-+--+---+---+---+-+--+-+---
 FD/350016E8 |   -25547 | 1 |   152 |  2576 |8192 | 8192 |
  4 |   5638282
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-27 Thread Andres Freund
On 2014-02-27 23:41:08 +, Greg Stark wrote:
 Though I notice something I can't understand here.
 
 After activating the new clone subsequent attempts to select rows from
 the page bump the LSN, presumably due to touching hint bits (since the
 prune xid hasn't changed). But the checksum hasn't changed even after
 running CHECKPOINT.

Are you running with full_page_writes=off?

Only delete and update do a PageSetPrunable(), so prune_xid not being
changed doesn't say much...

 How is it possible for the LSN to get updated without changing the checksum?

Generally the LSN is computed when writing, not when a buffer is
modified, so that's not particularly surprising. It'd be interesting to
see what the records are that end on those LSNs.

It'd probably nice to add the capability to dump records that end in a
particular location to pg_xlogdump...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Alvaro Herrera
Andres Freund wrote:

 static void
 heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
 {
 ...
 HeapTupleHeaderClearHotUpdated(htup);
 HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
 HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 /* Make sure there is no forward chain link in t_ctid */
 htup-t_ctid = xlrec-target.tid;
 ...
 }

I think the fix is to reset HOT_UPDATED and t_ctid only if the infomask
says the tuple is LOCKED_ONLY, per the attached patch.  This matches
what heap_lock_tuple is doing originally:

if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
HeapTupleHeaderClearHotUpdated(tuple-t_data);
HeapTupleHeaderSetXmax(tuple-t_data, xid);

/*
 * Make sure there is no forward chain link in t_ctid.  Note that in the
 * cases where the tuple has been updated, we must not overwrite t_ctid,
 * because it was set by the updater.  Moreover, if the tuple has been
 * updated, we need to follow the update chain to lock the new versions 
of
 * the tuple as well.
 */
if (HEAP_XMAX_IS_LOCKED_ONLY(new_infomask))
tuple-t_data-t_ctid = *tid;

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a771ccb..6d1f9e0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8016,11 +8016,14 @@ heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
 
 	fix_infomask_from_infobits(xlrec-infobits_set, htup-t_infomask,
 			   htup-t_infomask2);
-	HeapTupleHeaderClearHotUpdated(htup);
+	if (HEAP_XMAX_IS_LOCKED_ONLY(htup-t_infomask))
+	{
+		HeapTupleHeaderClearHotUpdated(htup);
+		/* Make sure there is no forward chain link in t_ctid */
+		htup-t_ctid = xlrec-target.tid;
+	}
 	HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
 	HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
-	/* Make sure there is no forward chain link in t_ctid */
-	htup-t_ctid = xlrec-target.tid;
 	PageSetLSN(page, lsn);
 	MarkBufferDirty(buffer);
 	UnlockReleaseBuffer(buffer);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Alvaro Herrera
I forgot to mention that the bug can be reproduced in a hot-standby
setup with the attached isolation spec.  Note that full_page_writes must
be turned off (otherwise, the updates use full-page writes and then the
bogus code is not run).  Once the spec is executed, in the replica run

SET enable_seqscan TO off;
SELECT * FROM f WHERE a=1;

and you get no rows back.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

setup
{
DROP TABLE IF EXISTS f;
CREATE TABLE f (a int primary key, b int);
INSERT INTO f values (1, 42);
}

session s1
step begin{ BEGIN; }
step update   { UPDATE f SET b = b + 1; }
step commit   { COMMIT; }

session s2
step lock { SELECT * FROM f FOR KEY SHARE; }

permutation begin update update lock commit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-26 Thread Andres Freund
On 2014-02-26 18:18:05 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
 
  static void
  heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
  {
  ...
  HeapTupleHeaderClearHotUpdated(htup);
  HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
  HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
  /* Make sure there is no forward chain link in t_ctid */
  htup-t_ctid = xlrec-target.tid;
  ...
  }
 
 I think the fix is to reset HOT_UPDATED and t_ctid only if the infomask
 says the tuple is LOCKED_ONLY, per the attached patch.

Looks good to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Andres Freund
On 2014-02-20 13:25:35 +, Greg Stark wrote:
 rmgr: Heaplen (rec/tot):235/   267, tx:5943845, lsn:
 FD/2F0A3640, prev FD/2F0A3600, bkp: , desc: insert: rel
 1663/16385/212653; tid 13065/2

  lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |
 t_ctid   | t_infomask2 | t_infomask | t_hoff |
 ++--++-+-+--++-+++-
   2 |   3424 |1 |232 | 5943845 |  728896 |0 |
 (13065,2)  |  32 |   4419 | 32 |
   3 |   3152 |1 |272 | 5943849 | 5943879 |0 |
 (13065,4)  |   49184 |   9475 | 32 |
   4 |   2864 |1 |287 | 5943879 | 5943880 |0 |
 (13065,7)  |   49184 |   9475 | 32 |
   7 |   2576 |1 |287 | 5943880 |   0 |0 |
 (13065,7)  |   32800 |  10499 | 32 |

Any chance you could send those again, with less linebreak damage? They
are really hard to read this way...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Alvaro Herrera
Here's a reformatted copy.  I think this is the same bug as Peter G.
reported in 
http://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com

I have a hunch that this is related to the heap_lock_updated business.
I haven't investigated yet.


Greg Stark wrote:
 I have a database where a a couple rows don't appear in index scans
 but do appear in sequential scans. It looks like the same problem as
 Peter reported but this is a different database. I've extracted all
 the xlogdump records and below are the ones I think are relevant. You
 can see that lp 2 gets a few HOT updates and concurrently has someone
 create a MultiXact NO KEY UPDATE lock while one of those HOT updates
 is pending but not committed. The net result seems to be that the ctid
 update chain got broken. The index of course points to the head of the
 HOT chain so it doesn't find the live tail whereas the sequential scan
 picks it up.
 
 I don't see any evidence of MultiXactId wraparound, the members run to
 001F and the offsets run to 000B. This is on a standby that's been
 activated but afaik that shouldn't change these files any more right?
 
 rmgr: Heaplen (rec/tot):235/   267, tx:5943845, lsn: 
 FD/2F0A3640, prev FD/2F0A3600, bkp: , desc: insert: rel 
 1663/16385/212653; tid 13065/2
 rmgr: Transaction len (rec/tot): 12/44, tx:5943845, lsn: 
 FD/2F0A8178, prev FD/2F0A8148, bkp: , desc: commit: 2014-02-19 
 20:41:23.698513 UTC
 rmgr: Heaplen (rec/tot): 25/57, tx:5943847, lsn: 
 FD/2F0AA440, prev FD/2F0AA3F8, bkp: , desc: lock 5943847: rel 
 1663/16385/212653; tid 13065/2 LOCK_ONLY KEYSHR_LOCK
 rmgr: Transaction len (rec/tot): 12/44, tx:5943847, lsn: 
 FD/2F0AA480, prev FD/2F0AA440, bkp: , desc: commit: 2014-02-19 
 20:41:23.713969 UTC
 rmgr: Heaplen (rec/tot):291/   323, tx:5943849, lsn: 
 FD/2F0ADFC0, prev FD/2F0ADF90, bkp: , desc: hot_update: rel 
 1663/16385/212653; tid 13065/2 xmax 5943849 ; new tid 13065/3 xmax 0
 rmgr: Heap2   len (rec/tot): 25/57, tx:5943851, lsn: 
 FD/2F0AE450, prev FD/2F0AE408, bkp: , desc: lock updated: xmax 5943851 
 msk 000a; rel 1663/16385/212653; tid 13065/3
 rmgr: MultiXact   len (rec/tot): 28/60, tx:5943851, lsn: 
 FD/2F0AE490, prev FD/2F0AE450, bkp: , desc: create mxid 728896 offset 
 1632045 nmembers 2: 5943849 (nokeyupd) 5943851 (keysh)
 rmgr: Heaplen (rec/tot): 25/57, tx:5943851, lsn: 
 FD/2F0AE4D0, prev FD/2F0AE490, bkp: , desc: lock 728896: rel 
 1663/16385/212653; tid 13065/2 IS_MULTI EXCL_LOCK
 rmgr: Transaction len (rec/tot): 12/44, tx:5943849, lsn: 
 FD/2F0AE510, prev FD/2F0AE4D0, bkp: , desc: commit: 2014-02-19 
 20:41:23.744989 UTC
 rmgr: Transaction len (rec/tot): 12/44, tx:5943851, lsn: 
 FD/2F0AE570, prev FD/2F0AE540, bkp: , desc: commit: 2014-02-19 
 20:41:23.746820 UTC
 rmgr: Heaplen (rec/tot):306/   338, tx:5943879, lsn: 
 FD/2F103788, prev FD/2F103758, bkp: , desc: hot_update: rel 
 1663/16385/212653; tid 13065/3 xmax 5943879 ; new tid 13065/4 xmax 0
 rmgr: Transaction len (rec/tot): 12/44, tx:5943879, lsn: 
 FD/2F1038E0, prev FD/2F103788, bkp: , desc: commit: 2014-02-19 
 20:41:24.580827 UTC
 rmgr: Heaplen (rec/tot):306/   338, tx:5943880, lsn: 
 FD/2F103910, prev FD/2F1038E0, bkp: , desc: hot_update: rel 
 1663/16385/212653; tid 13065/4 xmax 5943880 ; new tid 13065/7 xmax 0
 rmgr: Transaction len (rec/tot): 12/44, tx:5943880, lsn: 
 FD/2F106070, prev FD/2F106030, bkp: , desc: commit: 2014-02-19 
 20:41:24.617048 UTC
 
 
  lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |   t_ctid   
 | t_infomask2 | t_infomask | t_hoff |
 ++--++-+-+--++-+++-
   2 |   3424 |1 |232 | 5943845 |  728896 |0 | (13065,2)  
 |  32 |   4419 | 32 |
   3 |   3152 |1 |272 | 5943849 | 5943879 |0 | (13065,4)  
 |   49184 |   9475 | 32 |
   4 |   2864 |1 |287 | 5943879 | 5943880 |0 | (13065,7)  
 |   49184 |   9475 | 32 |
   7 |   2576 |1 |287 | 5943880 |   0 |0 | (13065,7)  
 |   32800 |  10499 | 32 |
 
 
 -- 
 greg
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Andres Freund
Hi,

On 2014-02-24 17:55:14 -0300, Alvaro Herrera wrote:
 Greg Stark wrote:
  I have a database where a a couple rows don't appear in index scans
  but do appear in sequential scans. It looks like the same problem as
  Peter reported but this is a different database. I've extracted all
  the xlogdump records and below are the ones I think are relevant. You
  can see that lp 2 gets a few HOT updates and concurrently has someone
  create a MultiXact NO KEY UPDATE lock while one of those HOT updates
  is pending but not committed.

Per se the sequence of records doesn't look bad (even though I am not
happy that we log intermediate and final rows first, and only then the
start of the chain).

  The net result seems to be that the ctid
  update chain got broken. The index of course points to the head of the
  HOT chain so it doesn't find the live tail whereas the sequential scan
  picks it up.

Yea, that's the problem.

  rmgr: Heaplen (rec/tot):291/   323, tx:5943849, lsn: 
  FD/2F0ADFC0, prev FD/2F0ADF90, bkp: , desc: hot_update: rel 
  1663/16385/212653; tid 13065/2 xmax 5943849 ; new tid 13065/3 xmax 0
  rmgr: Heap2   len (rec/tot): 25/57, tx:5943851, lsn: 
  FD/2F0AE450, prev FD/2F0AE408, bkp: , desc: lock updated: xmax 5943851 
  msk 000a; rel 1663/16385/212653; tid 13065/3
  rmgr: MultiXact   len (rec/tot): 28/60, tx:5943851, lsn: 
  FD/2F0AE490, prev FD/2F0AE450, bkp: , desc: create mxid 728896 offset 
  1632045 nmembers 2: 5943849 (nokeyupd) 5943851 (keysh)
  rmgr: Heaplen (rec/tot): 25/57, tx:5943851, lsn: 
  FD/2F0AE4D0, prev FD/2F0AE490, bkp: , desc: lock 728896: rel 
  1663/16385/212653; tid 13065/2 IS_MULTI EXCL_LOCK

   lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |   t_ctid  
   | t_infomask2 | t_infomask | t_hoff |
  ++--++-+-+--++-+++-
2 |   3424 |1 |232 | 5943845 |  728896 |0 | (13065,2) 
   |  32 |   4419 | 32 |
3 |   3152 |1 |272 | 5943849 | 5943879 |0 | (13065,4) 
   |   49184 |   9475 | 32 |
4 |   2864 |1 |287 | 5943879 | 5943880 |0 | (13065,7) 
   |   49184 |   9475 | 32 |
7 |   2576 |1 |287 | 5943880 |   0 |0 | (13065,7) 
   |   32800 |  10499 | 32 |

Those together explain the story. Note this bit:

static void
heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
{
...
HeapTupleHeaderClearHotUpdated(htup);
HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
/* Make sure there is no forward chain link in t_ctid */
htup-t_ctid = xlrec-target.tid;
...
}

So, the replay of FD/2F0AE4D0 breaks the ctid chain *and* unsets the
HOT_UPDATED flag.

Which means fkey locks have never properly worked across SR/crash
recovery.

Haven't thought about how to fix it yet, I hope won't have to (hint hint).

We somehow need to have a policy of testing changes to the WAL format
without full_page_writes. They hide bugs in replay far, far too often.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Andres Freund
On 2014-02-24 22:17:31 +0100, Andres Freund wrote:
 Those together explain the story. Note this bit:
 
 static void
 heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
 {
 ...
 HeapTupleHeaderClearHotUpdated(htup);
 HeapTupleHeaderSetXmax(htup, xlrec-locking_xid);
 HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 /* Make sure there is no forward chain link in t_ctid */
 htup-t_ctid = xlrec-target.tid;
 ...
 }
 
 So, the replay of FD/2F0AE4D0 breaks the ctid chain *and* unsets the
 HOT_UPDATED flag.

Some quick archeology shows that the HeapTupleHeaderClearHotUpdated()
was in the original HOT commit (282d2a03d) and clearing of t_ctid was in
the original commit implementing FOR SHARE (bedb78d38). Both look like
they are copied from other places, I don't immediately see any need for
them here...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Peter Geoghegan
On Mon, Feb 24, 2014 at 1:17 PM, Andres Freund and...@2ndquadrant.com wrote:
 We somehow need to have a policy of testing changes to the WAL format
 without full_page_writes. They hide bugs in replay far, far too often.

What's the easiest way to get atomic page writes at the FS level on
your laptop? ZFS or some data journaling FS, I suppose.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Andres Freund
On 2014-02-24 15:05:37 -0800, Peter Geoghegan wrote:
 On Mon, Feb 24, 2014 at 1:17 PM, Andres Freund and...@2ndquadrant.com wrote:
  We somehow need to have a policy of testing changes to the WAL format
  without full_page_writes. They hide bugs in replay far, far too often.
 
 What's the easiest way to get atomic page writes at the FS level on
 your laptop? ZFS or some data journaling FS, I suppose.

TBH I don't care about torn pages during normal testing. I don't want to
suggest disabling it for real workloads with real data, just that it's
important to do so during development/testing of WAL related code,
because otherwise it will hide/fixup many errors.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Peter Geoghegan
On Mon, Feb 24, 2014 at 3:17 PM, Andres Freund and...@2ndquadrant.com wrote:
 TBH I don't care about torn pages during normal testing. I don't want to
 suggest disabling it for real workloads with real data, just that it's
 important to do so during development/testing of WAL related code,
 because otherwise it will hide/fixup many errors.

Sure, but you might want to account for torn pages anyway.
Particularly if you're interested in some degree of automation, as we
all seem to be.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-24 Thread Andres Freund
On 2014-02-24 15:20:13 -0800, Peter Geoghegan wrote:
 On Mon, Feb 24, 2014 at 3:17 PM, Andres Freund and...@2ndquadrant.com wrote:
  TBH I don't care about torn pages during normal testing. I don't want to
  suggest disabling it for real workloads with real data, just that it's
  important to do so during development/testing of WAL related code,
  because otherwise it will hide/fixup many errors.
 
 Sure, but you might want to account for torn pages anyway.
 Particularly if you're interested in some degree of automation, as we
 all seem to be.

Hm, well. I have to admit, if a test machine crashes, I'd just rebuild
the cluster. Unless I am working on crash safety testing in which case
I'd probably not have full_page_writes disabled...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-20 Thread Greg Stark
I have a database where a a couple rows don't appear in index scans
but do appear in sequential scans. It looks like the same problem as
Peter reported but this is a different database. I've extracted all
the xlogdump records and below are the ones I think are relevant. You
can see that lp 2 gets a few HOT updates and concurrently has someone
create a MultiXact NO KEY UPDATE lock while one of those HOT updates
is pending but not committed. The net result seems to be that the ctid
update chain got broken. The index of course points to the head of the
HOT chain so it doesn't find the live tail whereas the sequential scan
picks it up.

I don't see any evidence of MultiXactId wraparound, the members run to
001F and the offsets run to 000B. This is on a standby that's been
activated but afaik that shouldn't change these files any more right?

rmgr: Heaplen (rec/tot):235/   267, tx:5943845, lsn:
FD/2F0A3640, prev FD/2F0A3600, bkp: , desc: insert: rel
1663/16385/212653; tid 13065/2
rmgr: Transaction len (rec/tot): 12/44, tx:5943845, lsn:
FD/2F0A8178, prev FD/2F0A8148, bkp: , desc: commit: 2014-02-19
20:41:23.698513 UTC
rmgr: Heaplen (rec/tot): 25/57, tx:5943847, lsn:
FD/2F0AA440, prev FD/2F0AA3F8, bkp: , desc: lock 5943847: rel
1663/16385/212653; tid 13065/2 LOCK_ONLY KEYSHR_LOCK
rmgr: Transaction len (rec/tot): 12/44, tx:5943847, lsn:
FD/2F0AA480, prev FD/2F0AA440, bkp: , desc: commit: 2014-02-19
20:41:23.713969 UTC
rmgr: Heaplen (rec/tot):291/   323, tx:5943849, lsn:
FD/2F0ADFC0, prev FD/2F0ADF90, bkp: , desc: hot_update: rel
1663/16385/212653; tid 13065/2 xmax 5943849 ; new tid 13065/3 xmax 0
rmgr: Heap2   len (rec/tot): 25/57, tx:5943851, lsn:
FD/2F0AE450, prev FD/2F0AE408, bkp: , desc: lock updated: xmax
5943851 msk 000a; rel 1663/16385/212653; tid 13065/3
rmgr: MultiXact   len (rec/tot): 28/60, tx:5943851, lsn:
FD/2F0AE490, prev FD/2F0AE450, bkp: , desc: create mxid 728896
offset 1632045 nmembers 2: 5943849 (nokeyupd) 5943851 (keysh)
rmgr: Heaplen (rec/tot): 25/57, tx:5943851, lsn:
FD/2F0AE4D0, prev FD/2F0AE490, bkp: , desc: lock 728896: rel
1663/16385/212653; tid 13065/2 IS_MULTI EXCL_LOCK
rmgr: Transaction len (rec/tot): 12/44, tx:5943849, lsn:
FD/2F0AE510, prev FD/2F0AE4D0, bkp: , desc: commit: 2014-02-19
20:41:23.744989 UTC
rmgr: Transaction len (rec/tot): 12/44, tx:5943851, lsn:
FD/2F0AE570, prev FD/2F0AE540, bkp: , desc: commit: 2014-02-19
20:41:23.746820 UTC
rmgr: Heaplen (rec/tot):306/   338, tx:5943879, lsn:
FD/2F103788, prev FD/2F103758, bkp: , desc: hot_update: rel
1663/16385/212653; tid 13065/3 xmax 5943879 ; new tid 13065/4 xmax 0
rmgr: Transaction len (rec/tot): 12/44, tx:5943879, lsn:
FD/2F1038E0, prev FD/2F103788, bkp: , desc: commit: 2014-02-19
20:41:24.580827 UTC
rmgr: Heaplen (rec/tot):306/   338, tx:5943880, lsn:
FD/2F103910, prev FD/2F1038E0, bkp: , desc: hot_update: rel
1663/16385/212653; tid 13065/4 xmax 5943880 ; new tid 13065/7 xmax 0
rmgr: Transaction len (rec/tot): 12/44, tx:5943880, lsn:
FD/2F106070, prev FD/2F106030, bkp: , desc: commit: 2014-02-19
20:41:24.617048 UTC


 lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |
t_ctid   | t_infomask2 | t_infomask | t_hoff |
++--++-+-+--++-+++-
  2 |   3424 |1 |232 | 5943845 |  728896 |0 |
(13065,2)  |  32 |   4419 | 32 |
  3 |   3152 |1 |272 | 5943849 | 5943879 |0 |
(13065,4)  |   49184 |   9475 | 32 |
  4 |   2864 |1 |287 | 5943879 | 5943880 |0 |
(13065,7)  |   49184 |   9475 | 32 |
  7 |   2576 |1 |287 | 5943880 |   0 |0 |
(13065,7)  |   32800 |  10499 | 32 |


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?

2014-02-20 Thread Andres Freund
Hi,

On 2014-02-20 13:25:35 +, Greg Stark wrote:
 I have a database where a a couple rows don't appear in index scans
 but do appear in sequential scans. It looks like the same problem as
 Peter reported but this is a different database. I've extracted all
 the xlogdump records and below are the ones I think are relevant. You
 can see that lp 2 gets a few HOT updates and concurrently has someone
 create a MultiXact NO KEY UPDATE lock while one of those HOT updates
 is pending but not committed. The net result seems to be that the ctid
 update chain got broken. The index of course points to the head of the
 HOT chain so it doesn't find the live tail whereas the sequential scan
 picks it up.
 
 I don't see any evidence of MultiXactId wraparound, the members run to
 001F and the offsets run to 000B. This is on a standby that's been
 activated but afaik that shouldn't change these files any more right?

I think this might actually be
c6cd27e36b9c58ceda8582ba81e37b6f9ad87d59,
2dcc48c35af5305fba0d8cb5e31fa0c25f52d13f might also be involved.

Hard to say at my current tiredness/caffeination level.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers