On Sat, Feb 15, 2014 at 11:45 AM, Andres Freund and...@2ndquadrant.com wrote:
I guess the theoretically correct thing would be to make all WAL records
about truncation and unlinking contain the current size of the relation,
but especially with deletions and forks that will probably turn out to
On 2014-02-14 22:30:45 -0500, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2014-02-14 20:46:01 +, Greg Stark wrote:
Going over this I think this is still a potential issue:
On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
I am not sure that explains
Going over this I think this is still a potential issue:
On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
I am not sure that explains the issue, but I think the redo action for
truncation is not safe across crashes. A XLOG_SMGR_TRUNCATE will just
do a smgrtruncate() (and
On 2014-02-14 20:46:01 +, Greg Stark wrote:
Going over this I think this is still a potential issue:
On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
I am not sure that explains the issue, but I think the redo action for
truncation is not safe across crashes. A
Andres Freund and...@2ndquadrant.com writes:
On 2014-02-14 20:46:01 +, Greg Stark wrote:
Going over this I think this is still a potential issue:
On 31 Jan 2014 15:56, Andres Freund and...@2ndquadrant.com wrote:
I am not sure that explains the issue, but I think the redo action for
Hi all,
On 02/12/2014 08:27 PM, Greg Stark wrote:
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
For what it's worth I've confirmed the bug in wal-e caused the initial
problem.
Huh? Bug in wal-e? What bug?
WAL-E actually didn't
I think what you're arguing is that we should see WAL records filling the
rest of segment 1 before we see any references to segment 2, but if that's
the case then how did we get into the situation you reported? Or is it
just that it was a broken base backup to start with?
The scenario I
Greg Stark st...@mit.edu writes:
I think what you're arguing is that we should see WAL records filling the
rest of segment 1 before we see any references to segment 2, but if that's
the case then how did we get into the situation you reported? Or is it
just that it was a broken base backup to
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
The scenario I could come up with that didn't require a broken base backup
was that there was an earlier truncate or vacuum. So the sequence is high
offset reference, truncate, growth, crash. All possibly on a single
database.
Greg Stark st...@mit.edu writes:
On Thu, Feb 13, 2014 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
That's what's bothering me, too. On the other hand, if we can't think of
a scenario where it'd be necessary to replay the high-offset update, then
I'm disinclined to mess with the code
So I think I've come up with a scenario that could cause this. I don't
think it's exactly what happened here but maybe something analogous
happened with our base backup restore.
On the primary you extend a table a bunch, including adding new
segments, but crash before committing (or
So here's my attempt to rewrite this logic. I ended up refactoring a
bit because I found it unnecessarily confusing having the mode
branches in several places. I think it's much clearer just having two
separate pieces of logic for RBM_NEW and the extension cases since all
they have in common is
Greg Stark st...@mit.edu writes:
So here's my attempt to rewrite this logic. I ended up refactoring a
bit because I found it unnecessarily confusing having the mode
branches in several places. I think it's much clearer just having two
separate pieces of logic for RBM_NEW and the extension
Greg Stark st...@mit.edu writes:
So I think I've come up with a scenario that could cause this. I don't
think it's exactly what happened here but maybe something analogous
happened with our base backup restore.
I agree it seems like a good idea for XLogReadBufferExtended to defend
itself
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
How about the attached instead?
This does possibly allocate an extra block past the target block. I'm
not sure how surprising that would be for the rest of the code.
For what it's worth I've confirmed the bug in wal-e caused
I wrote:
Greg Stark st...@mit.edu writes:
(Or maybe the hot backup
process could just catch the files in this state if a table is rapidly
growing and it doesn't take care to avoid picking up new files that
appear after it starts?)
That's a possible explanation I guess, but it doesn't seem
Greg Stark st...@mit.edu writes:
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
How about the attached instead?
This does possibly allocate an extra block past the target block. I'm
not sure how surprising that would be for the rest of the code.
Should be fine; we could
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
How about the attached instead?
This does possibly allocate an extra block past the target block. I'm
not sure how
Greg Stark st...@mit.edu writes:
On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
This does possibly allocate an extra block past the target block. I'm
not sure how surprising that would be for the rest of the code.
Should be fine; we could
I wrote:
Greg Stark st...@mit.edu writes:
WAL-E actually didn't restore a whole 1GB file due to a transient S3
problem, in fact a bunch of them.
Hah. Okay, I think we can write this issue off as closed then.
Oh, wait a minute. It's not just a matter of whether we find the right
block: we
I wrote:
What I think we probably want to do is forcibly cause the target page
to exist, using a P_NEW loop like what I committed, and then decide
on the basis of whether it's all-zeroes whether to consider it invalid
or not. This seems sane on the grounds that it's just the extension
to the
On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Oh, wait a minute. It's not just a matter of whether we find the right
block: we also have to consider whether XLogReadBufferExtended will
apply the right mode behavior. Currently, it supposes that all pages
past the
Greg Stark st...@mit.edu writes:
On Wed, Feb 12, 2014 at 8:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Oh, wait a minute. It's not just a matter of whether we find the right
block: we also have to consider whether XLogReadBufferExtended will
apply the right mode behavior. Currently, it
On Sun, Feb 9, 2014 at 2:54 PM, Greg Stark st...@mit.edu wrote:
Bad block's page header -- this is in the 56'th relation segment:
=# select
(page_header(E'\\x2005583b05aa050028001805002004201098e00f2090e00f088d24061885e00f')).*;
lsn | tli | flags |
On Thu, Feb 6, 2014 at 11:41 PM, Greg Stark st...@mit.edu wrote:
That doesn't explain the other instance or the other copies of this
database. I think the most productive thing I can do is switch my
attention to the other database to see if it really looks like the
same problem.
So here's an
On 2014-02-06 20:06:03 -0500, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
That reminds me, not that I directly see how it could be responsible,
there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I
don't think we came to a agreement in that thread how to fix
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
What version were you running before 9.1.11 exactly? I took a look
through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any
changes that seemed even vaguely related to this. There are some
changes in
Greg Stark st...@mit.edu writes:
Both the primary and the standby were 9.1.11 from the get-go. The
database the primary was forked off of was 9.1.10 but as far as I can
tell the primary in the current pair has no problems.
What's worse is we created a new standby from the same base backup and
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I had noticed that the WAL records that were mis-replayed seemed to
be bunched pretty close together (two of them even adjacent). Could
you confirm that? If so, it seems like we're looking for some condition
that makes
On 2014-02-06 23:41:19 +0100, Greg Stark wrote:
The problem with the bgwriter being at fault is that from what I can
see the bgwriter will never extend a file. That means the xlog
recovery code must have done it. That means even if the bgwriter came
along and looked at the buffer we just read
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote:
That's not necessarily true. If e.g. the buffer mapping would change
racily, the result write from the bgwriter could very well end up
increasing the file size, leaving a hole inbetween its write and the
original
Greg Stark st...@mit.edu writes:
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote:
That's not necessarily true. If e.g. the buffer mapping would change
racily, the result write from the bgwriter could very well end up
increasing the file size, leaving a hole
On 2014-02-06 18:42:05 -0500, Tom Lane wrote:
Greg Stark st...@mit.edu writes:
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com
wrote:
That's not necessarily true. If e.g. the buffer mapping would change
racily, the result write from the bgwriter could very well end
Andres Freund and...@2ndquadrant.com writes:
That reminds me, not that I directly see how it could be responsible,
there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I
don't think we came to a agreement in that thread how to fix the
problem.
Hm, yeah. I'm not sure I believe
I've poked at this a bit more. There are at least 10 relations where
the last block doesn't match the block mentioned in the xlog record
that its LSN indicates. At least it looks like from the info xlogdump
prints.
Including two blocks where the correct block has the same LSN which
maybe means
Hm, I'm not entirely convinced those are erroneous replays to wrong
blocks. They don't look right but there are no blocks of NULs
preceding them. So if they're wrong then they only extended the
relations by a single block.
The relfilenodes that have nul blocks before the last block are:
Greg Stark st...@mit.edu writes:
The relfilenodes that have nul blocks before the last block are:
Can we see the associated WAL records (ie, the ones matching the LSNs
in the last blocks of these files)?
regards, tom lane
--
Sent via pgsql-hackers mailing list
On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark st...@mit.edu writes:
The relfilenodes that have nul blocks before the last block are:
Can we see the associated WAL records (ie, the ones matching the LSNs
in the last blocks of these files)?
Sorry, I've lost
Greg Stark st...@mit.edu writes:
On Sun, Feb 2, 2014 at 6:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Can we see the associated WAL records (ie, the ones matching the LSNs
in the last blocks of these files)?
Sorry, I've lost track of what information I already shared or didn't,
Hm. So one of
On Fri, Jan 31, 2014 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
So on a filesystem that supports holes
in files, I'd expect that the added segments would be fully allocated
if XLogReadBufferExtended did the deed, but they'd be quite small if
_mdfd_getseg did so. The du results you
The plot thickens...
Looking at the next relation I see a similar symptom of a single valid
block at the end of several segments of nuls. This relation is also a
btree on the same table and has a header in the near vicinity of the
xlog:
d9de7pcqls4ib6=# select
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote:
We're also seeing log entries about wal contains reference to invalid
pages but these errors seem only vaguely correlated. Sometimes we get
the errors but the tables don't grow noticeably and sometimes we don't
get
On 2014-01-31 11:09:14 +, Greg Stark wrote:
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote:
We're also seeing log entries about wal contains reference to invalid
pages but these errors seem only vaguely correlated. Sometimes we get
the errors but the
On 2014-01-31 11:09:14 +, Greg Stark wrote:
On Sun, Jan 26, 2014 at 5:45 PM, Andres Freund and...@2ndquadrant.com wrote:
We're also seeing log entries about wal contains reference to invalid
pages but these errors seem only vaguely correlated. Sometimes we get
the errors but the
On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote:
The slightly more likely explanation for transient errors is that you
hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had
only taken effect if HS has already assembled a snapshot, which can make
On 2014-01-31 11:46:09 +, Greg Stark wrote:
On Fri, Jan 31, 2014 at 11:26 AM, Andres Freund and...@2ndquadrant.com
wrote:
The slightly more likely explanation for transient errors is that you
hit the vacuum bug (061b079f89800929a863a692b952207cadf15886). That had
only taken effect if
1261982.53 is entirely nuls. I think that's true for most if not all
of the intervening files, still investigating.
The 54th segment is nul up to offset 1f0c after which it has valid
looking blocks:
# hexdump 1261982.54 | head -100
000
*
1f0c
On 2014-01-31 14:39:47 +, Greg Stark wrote:
1261982.53 is entirely nuls. I think that's true for most if not all
of the intervening files, still investigating.
The 54th segment is nul up to offset 1f0c after which it has valid
looking blocks:
It'd be interesting to dump the page
On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark st...@mit.edu wrote:
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982
blk:3634978 hole_off/len:1240/2072
[cur:EA1/638988, xid:1418089147, rmid:11(Btree),
On 2014-01-31 14:59:21 +, Greg Stark wrote:
On Fri, Jan 31, 2014 at 2:39 PM, Greg Stark st...@mit.edu wrote:
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982
blk:3634978 hole_off/len:1240/2072
On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund and...@2ndquadrant.com wrote:
It points to the end of the record (i.e. the beginning of the next). It
needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush
enough.
Ah, in which case the relevant record is:
[cur:EA1/637140,
On 2014-01-31 15:15:24 +, Greg Stark wrote:
On Fri, Jan 31, 2014 at 3:08 PM, Andres Freund and...@2ndquadrant.com wrote:
It points to the end of the record (i.e. the beginning of the next). It
needs to, because otherwise XLogFlush()es on the pd_lsn wouldn't flush
enough.
Ah, in
On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote:
=# select get_raw_page('data_pkey', 'main', 11073632) ;
ERROR: block number 11073632 is out of range for relation data_pkey
Isn't the page 3634978?
The page in the record is.
But the page on disk is in the 54th
On 2014-01-31 15:21:35 +, Greg Stark wrote:
On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote:
=# select get_raw_page('data_pkey', 'main', 11073632) ;
ERROR: block number 11073632 is out of range for relation data_pkey
Isn't the page 3634978?
The page in
Andres Freund and...@2ndquadrant.com writes:
It's interesting that the smgr gets this wrong then (as also evidenced
by the fact that relation_size does as well). Could you please do a ls
-l path/to/relfilenode*?
IIRC, smgrnblocks will stop as soon as it finds a segment that is not
1GB in size.
On 2014-01-31 10:33:16 -0500, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
It's interesting that the smgr gets this wrong then (as also evidenced
by the fact that relation_size does as well). Could you please do a ls
-l path/to/relfilenode*?
IIRC, smgrnblocks will stop as
Greg Stark st...@mit.edu writes:
On Fri, Jan 31, 2014 at 3:19 PM, Andres Freund and...@2ndquadrant.com wrote:
Isn't the page 3634978?
The page in the record is.
But the page on disk is in the 54th segment at offset 1F0C
So unless my arithmetic is wrong:
bc -l
ibase=16
400 * 400 *
Sorry guys. I transposed two numbers when looking up the relation.
data_pk wasn't the right index.
=# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ;
lsn | tli | flags | lower | upper | special | pagesize |
version | prune_xid
Greg Stark st...@mit.edu writes:
Sorry guys. I transposed two numbers when looking up the relation.
data_pk wasn't the right index.
=# select (page_header(get_raw_page('index_data_id', 'main', 3020854))).* ;
lsn | tli | flags | lower | upper | special | pagesize |
version |
On Fri, Jan 31, 2014 at 3:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
400 * 400 * 400 / 2000 * 54 + 1F0C / 2000
11073632
Ooops, it's reading 54 in hex there.
# select ((2^30) * 54.0 + 'x1F0C'::bit(32)::int) / 8192;
?column?
--
7141472
ibase=16
400 * 400 * 400 / 2000 * 36 +
On 2014-01-31 10:33:16 -0500, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
It's interesting that the smgr gets this wrong then (as also evidenced
by the fact that relation_size does as well). Could you please do a ls
-l path/to/relfilenode*?
IIRC, smgrnblocks will stop as
So just to summarize, this xlog record:
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid
3634978/282
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
info:8, prev:EA1/635290] bkpblock[1]:
Greg Stark st...@mit.edu writes:
So just to summarize, this xlog record:
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
info:8, prev:EA1/635290] insert_leaf: s/d/r:1663/16385/1261982 tid
3634978/282
[cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,
One thing I keep coming back to is a bad ran chip setting a bit in the
block number. But I just can't seem to get it to add up. The difference is
not a power of two, it had happened on two different machines, and we don't
see other weirdness on the machine. It seems like a strange coincidence it
Greg Stark st...@mit.edu writes:
One thing I keep coming back to is a bad ran chip setting a bit in the
block number. But I just can't seem to get it to add up. The difference is
not a power of two, it had happened on two different machines, and we don't
see other weirdness on the machine. It
On 01/31/2014 01:11 PM, Tom Lane wrote:
Greg Stark st...@mit.edu writes:
One thing I keep coming back to is a bad ran chip setting a bit in the
block number. But I just can't seem to get it to add up. The difference is
not a power of two, it had happened on two different machines, and we don't
On Fri, Jan 31, 2014 at 10:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Yeah, I'd been wondering if the WAL record somehow got corrupted while
in memory (presumably after being CRC-checked). It's a bit hard to see
how though.
One thing I mentioned early on but bears repeating is that this
Josh Berkus j...@agliodbs.com writes:
FWIW, we've periodically seen reports from our clients of replica
databases being slightly larger than the master. Nothing reproducable
or as severe as Greg's issue, or we'd have reported it. But this could
be a more widespread issue, just that it
Hi,
On 2014-01-24 19:23:28 -0500, Greg Stark wrote:
Since the point release we've run into a number of databases that when
we restore from a base backup end up being larger than the primary
database was. Sometimes by a large factor. The data below is from
9.1.11 (both primary and standby) but
On Sun, Jan 26, 2014 at 9:45 AM, Andres Freund and...@2ndquadrant.com wrote:
Hi,
On 2014-01-24 19:23:28 -0500, Greg Stark wrote:
Since the point release we've run into a number of databases that when
we restore from a base backup end up being larger than the primary
database was. Sometimes
Since the point release we've run into a number of databases that when
we restore from a base backup end up being larger than the primary
database was. Sometimes by a large factor. The data below is from
9.1.11 (both primary and standby) but we've seen the same thing on
9.2.6.
primary$ for i in
71 matches
Mail list logo