Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-16 Thread Greg Stark
On Sat, Feb 15, 2014 at 11:45 AM, Andres Freund and...@2ndquadrant.com wrote: I guess the theoretically correct thing would be to make all WAL records about truncation and unlinking contain the current size of the relation, but especially with deletions and forks that will probably turn out to

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-15 Thread Andres Freund
On 2014-02-14 22:30:45 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-02-14 20:46:01 +, Greg Stark wrote: Going over this I think this is still a potential issue: On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote: I am not sure that explains

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Greg Stark
Going over this I think this is still a potential issue: On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote: I am not sure that explains the issue, but I think the redo action for truncation is not safe across crashes. A XLOG_SMGR_TRUNCATE will just do a smgrtruncate() (and

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Andres Freund
On 2014-02-14 20:46:01 +, Greg Stark wrote: Going over this I think this is still a potential issue: On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote: I am not sure that explains the issue, but I think the redo action for truncation is not safe across crashes. A

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-02-14 20:46:01 +, Greg Stark wrote: Going over this I think this is still a potential issue: On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote: I am not sure that explains the issue, but I think the redo action for

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Andrea Suisani
Hi all, On 02/12/2014 08:27 PM, Greg Stark wrote: On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: For what it's worth I've confirmed the bug in wal-e caused the initial problem. Huh? Bug in wal-e? What bug? WAL-E actually didn't

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
I think what you're arguing is that we should see WAL records filling the rest of segment 1 before we see any references to segment 2, but if that's the case then how did we get into the situation you reported? Or is it just that it was a broken base backup to start with? The scenario I

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark st...@mit.edu writes: I think what you're arguing is that we should see WAL records filling the rest of segment 1 before we see any references to segment 2, but if that's the case then how did we get into the situation you reported? Or is it just that it was a broken base backup to

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Greg Stark
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: The scenario I could come up with that didn't require a broken base backup was that there was an earlier truncate or vacuum. So the sequence is high offset reference, truncate, growth, crash. All possibly on a single database.

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-13 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's what's bothering me, too. On the other hand, if we can't think of a scenario where it'd be necessary to replay the high-offset update, then I'm disinclined to mess with the code

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
So I think I've come up with a scenario that could cause this. I don't think it's exactly what happened here but maybe something analogous happened with our base backup restore. On the primary you extend a table a bunch, including adding new segments, but crash before committing (or

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
So here's my attempt to rewrite this logic. I ended up refactoring a bit because I found it unnecessarily confusing having the mode branches in several places. I think it's much clearer just having two separate pieces of logic for RBM_NEW and the extension cases since all they have in common is

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: So here's my attempt to rewrite this logic. I ended up refactoring a bit because I found it unnecessarily confusing having the mode branches in several places. I think it's much clearer just having two separate pieces of logic for RBM_NEW and the extension

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: So I think I've come up with a scenario that could cause this. I don't think it's exactly what happened here but maybe something analogous happened with our base backup restore. I agree it seems like a good idea for XLogReadBufferExtended to defend itself

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. For what it's worth I've confirmed the bug in wal-e caused

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: Greg Stark st...@mit.edu writes: (Or maybe the hot backup process could just catch the files in this state if a table is rapidly growing and it doesn't take care to avoid picking up new files that appear after it starts?) That's a possible explanation I guess, but it doesn't seem

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. Should be fine; we could

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote: How about the attached instead? This does possibly allocate an extra block past the target block. I'm not sure how

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: This does possibly allocate an extra block past the target block. I'm not sure how surprising that would be for the rest of the code. Should be fine; we could

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: Greg Stark st...@mit.edu writes: WAL-E actually didn't restore a whole 1GB file due to a transient S3 problem, in fact a bunch of them. Hah. Okay, I think we can write this issue off as closed then. Oh, wait a minute. It's not just a matter of whether we find the right block: we

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
I wrote: What I think we probably want to do is forcibly cause the target page to exist, using a P_NEW loop like what I committed, and then decide on the basis of whether it's all-zeroes whether to consider it invalid or not. This seems sane on the grounds that it's just the extension to the

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Greg Stark
On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, wait a minute. It's not just a matter of whether we find the right block: we also have to consider whether XLogReadBufferExtended will apply the right mode behavior. Currently, it supposes that all pages past the

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-12 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh, wait a minute. It's not just a matter of whether we find the right block: we also have to consider whether XLogReadBufferExtended will apply the right mode behavior. Currently, it

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-11 Thread Greg Stark
On Sun, Feb 9, 2014 at 2:54 PM, Greg Stark st...@mit.edu wrote: Bad block's page header -- this is in the 56'th relation segment: =# select (page_header(E'\\x2005583b05aa050028001805002004201098e00f2090e00f088d24061885e00f')).*; lsn | tli | flags |

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-09 Thread Greg Stark
On Thu, Feb 6, 2014 at 11:41 PM, Greg Stark st...@mit.edu wrote: That doesn't explain the other instance or the other copies of this database. I think the most productive thing I can do is switch my attention to the other database to see if it really looks like the same problem. So here's an

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-07 Thread Andres Freund
On 2014-02-06 20:06:03 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: That reminds me, not that I directly see how it could be responsible, there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I don't think we came to a agreement in that thread how to fix

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: What version were you running before 9.1.11 exactly? I took a look through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any changes that seemed even vaguely related to this. There are some changes in

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark st...@mit.edu writes: Both the primary and the standby were 9.1.11 from the get-go. The database the primary was forked off of was 9.1.10 but as far as I can tell the primary in the current pair has no problems. What's worse is we created a new standby from the same base backup and

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: I had noticed that the WAL records that were mis-replayed seemed to be bunched pretty close together (two of them even adjacent). Could you confirm that? If so, it seems like we're looking for some condition that makes

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 23:41:19 +0100, Greg Stark wrote: The problem with the bgwriter being at fault is that from what I can see the bgwriter will never extend a file. That means the xlog recovery code must have done it. That means even if the bgwriter came along and looked at the buffer we just read

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole inbetween its write and the original

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end up increasing the file size, leaving a hole

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 18:42:05 -0500, Tom Lane wrote: Greg Stark st...@mit.edu writes: On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote: That's not necessarily true. If e.g. the buffer mapping would change racily, the result write from the bgwriter could very well end

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: That reminds me, not that I directly see how it could be responsible, there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I don't think we came to a agreement in that thread how to fix the problem. Hm, yeah. I'm not sure I believe

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
I've poked at this a bit more. There are at least 10 relations where the last block doesn't match the block mentioned in the xlog record that its LSN indicates. At least it looks like from the info xlogdump prints. Including two blocks where the correct block has the same LSN which maybe means

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
Hm, I'm not entirely convinced those are erroneous replays to wrong blocks. They don't look right but there are no blocks of NULs preceding them. So if they're wrong then they only extended the relations by a single block. The relfilenodes that have nul blocks before the last block are:

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Tom Lane
Greg Stark st...@mit.edu writes: The relfilenodes that have nul blocks before the last block are: Can we see the associated WAL records (ie, the ones matching the LSNs in the last blocks of these files)? regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Greg Stark
On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: The relfilenodes that have nul blocks before the last block are: Can we see the associated WAL records (ie, the ones matching the LSNs in the last blocks of these files)? Sorry, I've lost

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-02 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Can we see the associated WAL records (ie, the ones matching the LSNs in the last blocks of these files)? Sorry, I've lost track of what information I already shared or didn't, Hm. So one of

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-01 Thread Greg Stark
On Fri, Jan 31, 2014 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: So on a filesystem that supports holes in files, I'd expect that the added segments would be fully allocated if XLogReadBufferExtended did the deed, but they'd be quite small if _mdfd_getseg did so. The du results you

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-01 Thread Greg Stark
The plot thickens... Looking at the next relation I see a similar symptom of a single valid block at the end of several segments of nuls. This relation is also a btree on the same table and has a header in the near vicinity of the xlog: d9de7pcqls4ib6=# select

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote: We're also seeing log entries about wal contains reference to invalid pages but these errors seem only vaguely correlated. Sometimes we get the errors but the tables don't grow noticeably and sometimes we don't get

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:09:14 +, Greg Stark wrote: On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote: We're also seeing log entries about wal contains reference to invalid pages but these errors seem only vaguely correlated. Sometimes we get the errors but the

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:09:14 +, Greg Stark wrote: On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote: We're also seeing log entries about wal contains reference to invalid pages but these errors seem only vaguely correlated. Sometimes we get the errors but the

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote: The slightly more likely explanation for transient errors is that you hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had only taken effect if HS has already assembled a snapshot, which can make

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 11:46:09 +, Greg Stark wrote: On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote: The slightly more likely explanation for transient errors is that you hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had only taken effect if

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
1261982.53 is entirely nuls. I think that's true for most if not all of the intervening files, still investigating. The 54th segment is nul up to offset 1f0c after which it has valid looking blocks: # hexdump 1261982.54 | head -100 000 * 1f0c

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 14:39:47 +, Greg Stark wrote: 1261982.53 is entirely nuls. I think that's true for most if not all of the intervening files, still investigating. The 54th segment is nul up to offset 1f0c after which it has valid looking blocks: It'd be interesting to dump the page

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark st...@mit.edu wrote: [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982 blk:3634978 hole_off/len:1240/2072 [cur:EA1/638988, xid:1418089147, rmid:11(Btree),

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 14:59:21 +, Greg Stark wrote: On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark st...@mit.edu wrote: [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982 blk:3634978 hole_off/len:1240/2072

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund and...@2ndquadrant.com wrote: It points to the end of the record (i.e. the beginning of the next). It needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush enough. Ah, in which case the relevant record is: [cur:EA1/637140,

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 15:15:24 +, Greg Stark wrote: On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund and...@2ndquadrant.com wrote: It points to the end of the record (i.e. the beginning of the next). It needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush enough. Ah, in

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote: =# select get_raw_page('data_pkey', 'main', 11073632) ; ERROR: block number 11073632 is out of range for relation data_pkey Isn't the page 3634978? The page in the record is. But the page on disk is in the 54th

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 15:21:35 +, Greg Stark wrote: On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote: =# select get_raw_page('data_pkey', 'main', 11073632) ; ERROR: block number 11073632 is out of range for relation data_pkey Isn't the page 3634978? The page in

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: It's interesting that the smgr gets this wrong then (as also evidenced by the fact that relation_size does as well). Could you please do a ls -l path/to/relfilenode*? IIRC, smgrnblocks will stop as soon as it finds a segment that is not 1GB in size.

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 10:33:16 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: It's interesting that the smgr gets this wrong then (as also evidenced by the fact that relation_size does as well). Could you please do a ls -l path/to/relfilenode*? IIRC, smgrnblocks will stop as

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote: Isn't the page 3634978? The page in the record is. But the page on disk is in the 54th segment at offset 1F0C So unless my arithmetic is wrong: bc -l ibase=16 400 * 400 *

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
Sorry guys. I transposed two numbers when looking up the relation. data_pk wasn't the right index. =# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ; lsn | tli | flags | lower | upper | special | pagesize | version | prune_xid

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark st...@mit.edu writes: Sorry guys. I transposed two numbers when looking up the relation. data_pk wasn't the right index. =# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ; lsn | tli | flags | lower | upper | special | pagesize | version |

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 3:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: 400 * 400 * 400 / 2000 * 54 + 1F0C / 2000 11073632 Ooops, it's reading 54 in hex there. # select ((2^30) * 54.0 + 'x1F0C'::bit(32)::int) / 8192; ?column? -- 7141472 ibase=16 400 * 400 * 400 / 2000 * 36 +

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Andres Freund
On 2014-01-31 10:33:16 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: It's interesting that the smgr gets this wrong then (as also evidenced by the fact that relation_size does as well). Could you please do a ls -l path/to/relfilenode*? IIRC, smgrnblocks will stop as

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
So just to summarize, this xlog record: [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid 3634978/282 [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] bkpblock[1]:

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark st...@mit.edu writes: So just to summarize, this xlog record: [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194, info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid 3634978/282 [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
One thing I keep coming back to is a bad ran chip setting a bit in the block number. But I just can't seem to get it to add up. The difference is not a power of two, it had happened on two different machines, and we don't see other weirdness on the machine. It seems like a strange coincidence it

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Greg Stark st...@mit.edu writes: One thing I keep coming back to is a bad ran chip setting a bit in the block number. But I just can't seem to get it to add up. The difference is not a power of two, it had happened on two different machines, and we don't see other weirdness on the machine. It

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Josh Berkus
On 01/31/2014 01:11 PM, Tom Lane wrote: Greg Stark st...@mit.edu writes: One thing I keep coming back to is a bad ran chip setting a bit in the block number. But I just can't seem to get it to add up. The difference is not a power of two, it had happened on two different machines, and we don't

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Greg Stark
On Fri, Jan 31, 2014 at 10:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, I'd been wondering if the WAL record somehow got corrupted while in memory (presumably after being CRC-checked). It's a bit hard to see how though. One thing I mentioned early on but bears repeating is that this

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-31 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes: FWIW, we've periodically seen reports from our clients of replica databases being slightly larger than the master. Nothing reproducable or as severe as Greg's issue, or we'd have reported it. But this could be a more widespread issue, just that it

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-26 Thread Andres Freund
Hi, On 2014-01-24 19:23:28 -0500, Greg Stark wrote: Since the point release we've run into a number of databases that when we restore from a base backup end up being larger than the primary database was. Sometimes by a large factor. The data below is from 9.1.11 (both primary and standby) but

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-26 Thread Greg Stark
On Sun, Jan 26, 2014 at 9:45 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-01-24 19:23:28 -0500, Greg Stark wrote: Since the point release we've run into a number of databases that when we restore from a base backup end up being larger than the primary database was. Sometimes

[HACKERS] Recovery inconsistencies, standby much larger than primary

2014-01-24 Thread Greg Stark
Since the point release we've run into a number of databases that when we restore from a base backup end up being larger than the primary database was. Sometimes by a large factor. The data below is from 9.1.11 (both primary and standby) but we've seen the same thing on 9.2.6. primary$ for i in