Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
 How would you characterize the chances of this happening with default
 *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
 this bug during each of ~10 generations of autovacuum_freeze_max_age before
 the old rows actually become invisible.

On second thought, it's quite possible to see problems before that
leading to more problems. A single occurance of such a illegitimate
increase in relfrozenxid can be enough to cause problems of a slightly
different nature.
As relfrozenxid has been updated we might now, or after vacuuming some
other tables, become elegible to truncate the clog. In that case we'll
get ERRORs about could not access status of transaction if the tuple
hasn't been fully hinted when scanning it later.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-12-01 13:33:42 +0100, Andres Freund wrote:
 On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
  How would you characterize the chances of this happening with default
  *vacuum_freeze_*_age settings?  Offhand, it seems you would need to 
  encounter
  this bug during each of ~10 generations of autovacuum_freeze_max_age before
  the old rows actually become invisible.
 
 On second thought, it's quite possible to see problems before that
 leading to more problems. A single occurance of such a illegitimate
 increase in relfrozenxid can be enough to cause problems of a slightly
 different nature.
 As relfrozenxid has been updated we might now, or after vacuuming some
 other tables, become elegible to truncate the clog. In that case we'll
 get ERRORs about could not access status of transaction if the tuple
 hasn't been fully hinted when scanning it later.

And indeed, a quick search shows up some threads that might suffer from
it:
bd7ee863f673a14ebf99d95562aee15e44b1d...@digi-pdc.digitilitiprod.int
caazpmnxfdrv72wdmbev5tcqobye_wvgseqrkqj0fizxmcyy...@mail.gmail.com
cak9ovjwvazlmdmrhmpg1+s37z16j+bz8fbarzspmrhsxxh-...@mail.gmail.com

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Noah Misch
On Sun, Dec 01, 2013 at 01:55:45PM +0100, Andres Freund wrote:
 On 2013-12-01 13:33:42 +0100, Andres Freund wrote:
  On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
   How would you characterize the chances of this happening with default
   *vacuum_freeze_*_age settings?  Offhand, it seems you would need to 
   encounter
   this bug during each of ~10 generations of autovacuum_freeze_max_age 
   before
   the old rows actually become invisible.
  
  On second thought, it's quite possible to see problems before that
  leading to more problems. A single occurance of such a illegitimate
  increase in relfrozenxid can be enough to cause problems of a slightly
  different nature.
  As relfrozenxid has been updated we might now, or after vacuuming some
  other tables, become elegible to truncate the clog. In that case we'll
  get ERRORs about could not access status of transaction if the tuple
  hasn't been fully hinted when scanning it later.

Agreed.  Probably, the use of hint bits and the low frequency with which
TruncateCLOG() can actually remove something has kept this below the radar.

 And indeed, a quick search shows up some threads that might suffer from
 it:
 bd7ee863f673a14ebf99d95562aee15e44b1d...@digi-pdc.digitilitiprod.int

This system had multiple problems, a missing pg_subtrans file and a missing
TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it to
the freeze bug at hand.

 caazpmnxfdrv72wdmbev5tcqobye_wvgseqrkqj0fizxmcyy...@mail.gmail.com

This report is against PostgreSQL 8.1.11, which never had a commit like
b4b6923.  If a similar bug is at work, this older version acquired it through
a different vector.

 cak9ovjwvazlmdmrhmpg1+s37z16j+bz8fbarzspmrhsxxh-...@mail.gmail.com

Possible match, but suggestions of lower-level problems cloud the diagnosis.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
 This system had multiple problems, a missing pg_subtrans file and a missing
 TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it to
 the freeze bug at hand.

Those all sound like possible problems caused by the bug, no?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Noah Misch
On Sun, Dec 01, 2013 at 06:56:10PM +0100, Andres Freund wrote:
 On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
  This system had multiple problems, a missing pg_subtrans file and a missing
  TOAST chunk for pg_attribute.  I don't see a pg_clog problem connecting it 
  to
  the freeze bug at hand.
 
 Those all sound like possible problems caused by the bug, no?

pg_subtrans has a lifecycle unrelated to datfrozenxid.  I am not aware of a
mechanism connecting that problem to the bug at hand.

The missing TOAST chunk (in pg_statistic, not pg_attribute as I wrote above)
could happen from the XID space wrapping with that TOAST table page marked
all-visible but not frozen.  The bug reporter describes the start of that
error coinciding with a minor version upgrade, so that would be an odd
coincidence: 8.4.3 did not have the bug as we know it, so considerable time
would typically pass between the upgrade and that symptom appearing.  Can't
rule it out, but this user report fits the known bug symptoms only loosely.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The VACUUM implementation in 9.3 had several bugs: It removed multixact
 xmax values without regard of the importance of contained xids, it did
 not remove multixacts if the contained xids were too old and it relied
 on hint bits when checking whether a row needed to be frozen which might
 not have been set on replicas.

Uh ... what does the last have to do with it?  Surely we don't run
VACUUM on replicas.  Or are you talking about what might happen when
VACUUM is run on a former replica that's been promoted to master?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund


Tom Lane t...@sss.pgh.pa.us schrieb:
Andres Freund and...@2ndquadrant.com writes:
 The VACUUM implementation in 9.3 had several bugs: It removed
multixact
 xmax values without regard of the importance of contained xids, it
did
 not remove multixacts if the contained xids were too old and it
relied
 on hint bits when checking whether a row needed to be frozen which
might
 not have been set on replicas.

Uh ... what does the last have to do with it?  Surely we don't run
VACUUM on replicas.  Or are you talking about what might happen when
VACUUM is run on a former replica that's been promoted to master?

Unfortunately not. The problem is that xl_heap_freeze's redo function simply 
reexecutes heap-freeze-tuple() instead of logging much about each tuple...

Andres


-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund


Noah Misch n...@leadboat.com schrieb:
On Sun, Dec 01, 2013 at 06:56:10PM +0100, Andres Freund wrote:
 On 2013-12-01 12:49:40 -0500, Noah Misch wrote:
  This system had multiple problems, a missing pg_subtrans file and a
missing
  TOAST chunk for pg_attribute.  I don't see a pg_clog problem
connecting it to
  the freeze bug at hand.
 
 Those all sound like possible problems caused by the bug, no?

pg_subtrans has a lifecycle unrelated to datfrozenxid.  I am not aware
of a
mechanism connecting that problem to the bug at hand.

TransactinIdIsInProgress returns true for future xids. Which triggers the use 
of XactLockTableWait. Which then does SubtransGetParent of a far future xid...

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Tom Lane t...@sss.pgh.pa.us schrieb:
 Uh ... what does the last have to do with it?  Surely we don't run
 VACUUM on replicas.  Or are you talking about what might happen when
 VACUUM is run on a former replica that's been promoted to master?

 Unfortunately not. The problem is that xl_heap_freeze's redo function simply 
 reexecutes heap-freeze-tuple() instead of logging much about each tuple...

That was a pretty stupid choice ... we should think seriously about
changing that for 9.4.  In general the application of a WAL record
needs to be 100% deterministic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 * Fix possible data corruptions due to incomplete vacuuming (Andres Freund, 
 Heikki Linnakangas)

 Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
 a full table vacuum mistakenly increasing relfrozenxid as a result. This
 could happen if it managed to truncate the tail end of the table due to
 dead space. Possible consequences are:
 * Errors like could not access status of transaction XXX when
   accessing such rows.
 * Vanishing rows after more than 2^31 transactions have passed.

Is there really a significant risk of clog access errors due to this bug?
IIUC, the risk is that tuples in pages that vacuum skips due to being
all-visible might not be frozen when intended.  But it seems just about
certain that such tuples would be properly hinted already, which means
that nothing would ever go to clog to confirm that.  So ISTM the only real
risk is that rows would become invisible after 2^31 transactions (and then
visible again after 2^31 more).

And even then you'd need persistent bad luck, in the form of incorrect
advancements of relfrozenxid having happened often enough to prevent any
anti-wraparound vacuums from getting launched.

Am I missing something?  It's certainly a bad bug, but it seems to me
that the probability of data loss is low enough that it's not so
surprising this has escaped detection for so long.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  * Fix possible data corruptions due to incomplete vacuuming (Andres Freund, 
  Heikki Linnakangas)
 
  Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
  a full table vacuum mistakenly increasing relfrozenxid as a result. This
  could happen if it managed to truncate the tail end of the table due to
  dead space. Possible consequences are:
  * Errors like could not access status of transaction XXX when
accessing such rows.
  * Vanishing rows after more than 2^31 transactions have passed.
 
 Is there really a significant risk of clog access errors due to this bug?
 IIUC, the risk is that tuples in pages that vacuum skips due to being
 all-visible might not be frozen when intended.  But it seems just about
 certain that such tuples would be properly hinted already, which means
 that nothing would ever go to clog to confirm that.  So ISTM the only real
 risk is that rows would become invisible after 2^31 transactions (and then
 visible again after 2^31 more).

Unfortunately it's not actually too hard to hit due to following part of the
code in vacuumlazy.c:

/* We need buffer cleanup lock so that we can prune HOT chains. */
if (!ConditionalLockBufferForCleanup(buf))
{
/*
 * If we're not scanning the whole relation to guard against XID
 * wraparound, it's OK to skip vacuuming a page.  The next vacuum
 * will clean it up.
 */
if (!scan_all)
{
ReleaseBuffer(buf);
continue;
}
...

if you have some concurrency you hit that path pretty often. And once
such a vacuum went through the table it will have a higher relfrozenxid,
so an impending wave of anti-wraparound vacuums won't scan it.

Also, the hint bits won't be on potential standbys, so that's not
necessarily preventing problems.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-12-01 15:54:41 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Tom Lane t...@sss.pgh.pa.us schrieb:
  Uh ... what does the last have to do with it?  Surely we don't run
  VACUUM on replicas.  Or are you talking about what might happen when
  VACUUM is run on a former replica that's been promoted to master?
 
  Unfortunately not. The problem is that xl_heap_freeze's redo function 
  simply reexecutes heap-freeze-tuple() instead of logging much about each 
  tuple...
 
 That was a pretty stupid choice ... we should think seriously about
 changing that for 9.4.  In general the application of a WAL record
 needs to be 100% deterministic.

Completely agreed. I'm not really sure what led to that design choice
except the desire to save a bit of WAL volume. It's a pretty old piece
of code - a good while before I followed development in any form of
detail.
It's actually in the original commit
(48188e1621bb6711e7d092bee48523b18cd80177) that introduced today's form
of freezing.

If it had been a more robust format all along, potential damage from the
replication bug could have been fixed by a VACUUM FREEZE...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
 Is there really a significant risk of clog access errors due to this bug?
 IIUC, the risk is that tuples in pages that vacuum skips due to being
 all-visible might not be frozen when intended.

 Unfortunately it's not actually too hard to hit due to following part of the
 code in vacuumlazy.c:

   /*
* If we're not scanning the whole relation to guard against XID
* wraparound, it's OK to skip vacuuming a page.  The next vacuum
* will clean it up.
*/

Ah.  So it's only been *seriously* broken since commit bbb6e559c, ie 9.2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-12-01 Thread Andres Freund
On 2013-12-01 18:02:27 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-12-01 17:15:31 -0500, Tom Lane wrote:
  Is there really a significant risk of clog access errors due to this bug?
  IIUC, the risk is that tuples in pages that vacuum skips due to being
  all-visible might not be frozen when intended.
 
  Unfortunately it's not actually too hard to hit due to following part of the
  code in vacuumlazy.c:
 
  /*
   * If we're not scanning the whole relation to guard against XID
   * wraparound, it's OK to skip vacuuming a page.  The next vacuum
   * will clean it up.
   */
 
 Ah.  So it's only been *seriously* broken since commit bbb6e559c, ie 9.2.

Well, even before that crash recovery/replication didn't necessarily
preserve the hint bits. Even more so if somebody dared to set
full_page_writes=off.

I personally think full_page_writes=off should conflict with wal_level
!= minimal, btw, but I don't see much chance of gaining acceptance for
that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
Hi Noah,

On 2013-11-30 00:40:06 -0500, Noah Misch wrote:
   On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(
 
 Based on subsequent thread discussion, the plan you outline sounds reasonable.
 Here is a sketch of the specific semantics of that fixup.  If a HEAPTUPLE_LIVE
 tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
 or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
 Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
 in-progress deleter aborts.  Using log_newpage_buffer() seems fine; there's no
 need to optimize performance there.

We'd need to decide what to do with xmax values, they'd likely need to
be treated differently.

The problem with log_newpage_buffer() is that we'd quite possibly issue
one such call per item on a page. And that might become quite
expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
scary.

 (All the better if we can, with minimal
 hacks, convince heap_freeze_tuple() itself to log the right changes.)

That likely comes to late - we've already pruned the page and might have
made wrong decisions there. Also, heap_freeze_tuple() is run on both the
primary and standbys.
I think our xl_heap_freeze format, that relies on running
heap_freeze_tuple() during recovery, is a terrible idea, but we cant
change that right now.

 Time is tight to finalize this, but it would be best to get this into next
 week's release.  That way, the announcement, fix, and mitigating code
 pertaining to this data loss bug all land in the same release.  If necessary,
 I think it would be worth delaying the release, or issuing a new release a
 week or two later, to closely align those events.  That being said, I'm
 prepared to review a patch in this area over the weekend.

I don't think I currently have the energy/brainpower/time to develop
such a fix in a suitable quality till monday. I've worked pretty hard on
trying to fix the host of multixact data corruption bugs the last days
and developing a solution that I'd be happy to put into such critical
paths is certainly several days worth of work.

I am not sure if it's a good idea to delay the release because of this,
there are so many other critical issues that that seems like a bad
tradeoff.

That said, if somebody else is taking the lead I am certainly willing to
help in detail with review and testing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I am not sure if it's a good idea to delay the release because of this,
 there are so many other critical issues that that seems like a bad
 tradeoff.

Indeed.  We already said that this release was being done *now* because
of the replication bug, and I see no reason to change that.  The more
last-minute stuff we try to cram in, the bigger risk of (another)
mistake.  We've already taken a credibility hit from introducing a new
bug into the last round of update releases; let's please not take a
risk of doing that again.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 11:18:09 -0500, Tom Lane wrote:
 We've already taken a credibility hit from introducing a new
 bug into the last round of update releases; let's please not take a
 risk of doing that again.

On that front: I'd love for somebody else to look at the revised 9.3
freezing logic. It's way to complicated and there are quite some
subtleties about freezing that are hard to get right, as evidenced by
9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Noah Misch
On Sat, Nov 30, 2013 at 05:22:04PM +0100, Andres Freund wrote:
 On that front: I'd love for somebody else to look at the revised 9.3
 freezing logic.

Do you speak of the changes to xmax freezing arising from the FK lock
optimization?

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 11:18:09 -0500, Tom Lane wrote:
 Indeed.  We already said that this release was being done *now* because
 of the replication bug, and I see no reason to change that.

FWIW, I think the two other data corrupting bugs, incomplete freezing
due to truncation (all branches) and freezing overall (in 9.3), are at
least as bad because they take effect on the primary.
Not saying that because of my involvement, but because I think they need
to be presented at least as prominent in the release notes.
They bugs themselves are all fixed in the relevant branches, but I do
think we need to talk about about how to detect and fix possible
corruption.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 FWIW, I think the two other data corrupting bugs, incomplete freezing
 due to truncation (all branches) and freezing overall (in 9.3), are at
 least as bad because they take effect on the primary.
 Not saying that because of my involvement, but because I think they need
 to be presented at least as prominent in the release notes.
 They bugs themselves are all fixed in the relevant branches, but I do
 think we need to talk about about how to detect and fix possible
 corruption.

I was planning to draft up the release notes today.  Can you propose
text about the above?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 11:40:36 -0500, Noah Misch wrote:
 On Sat, Nov 30, 2013 at 05:22:04PM +0100, Andres Freund wrote:
  On that front: I'd love for somebody else to look at the revised 9.3
  freezing logic.
 
 Do you speak of the changes to xmax freezing arising from the FK lock
 optimization?

Yes. There had been several major bugs in 9.3+ around freezing:
* old updater xids in multixacts haven't been subjected to the new xmin
  horizon = inaccessible or duplicated rows.
* If a multi was too old, we simply removed it, even if it contained an
  committed xmax = duplicate rows
* If HEAP_XMAX_INVALID was set in heap_freeze_tuple() and
  heap_tuple_needs_freeze() we didn't do anything about xmax. That's
  completely not okay since the hint bit might not have been set on the
  standby = diverging standby and primary, with unfrozen rows remaining
  on the standby, missing or duplicate rows there.

The fixed (2393c7d102368717283d7121a6ea8164e902b011) heap_freeze_tuple()
and heap_tuple_needs_freeze() look much safer to me, but theres lots of
complexity there. I don't see any alternative to the complexity until we
change the format of xl_heap_freeze, but some more eyes than Alvaro's
and mine definitely would be good.

71ad5a8475b4df896a7baa71e6dd3c455eebae99 also touches quite some
intricate code paths and fixes a good amount of bugs, so it's definitely
also worthy of further inspection.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  FWIW, I think the two other data corrupting bugs, incomplete freezing
  due to truncation (all branches) and freezing overall (in 9.3), are at
  least as bad because they take effect on the primary.
  Not saying that because of my involvement, but because I think they need
  to be presented at least as prominent in the release notes.
  They bugs themselves are all fixed in the relevant branches, but I do
  think we need to talk about about how to detect and fix possible
  corruption.
 
 I was planning to draft up the release notes today.  Can you propose
 text about the above?

I can, but it will be a couple of hours before I can give it serious
thought (starving and insanity being serious perils otherwise ;)).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Noah Misch
On Sat, Nov 30, 2013 at 05:00:58PM +0100, Andres Freund wrote:
 The problem with log_newpage_buffer() is that we'd quite possibly issue
 one such call per item on a page. And that might become quite
 expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
 scary.

I had in mind issuing at most one call per page.  heap_page_prune() has a
structure conducive to that.

 On 2013-11-30 00:40:06 -0500, Noah Misch wrote:
  Time is tight to finalize this, but it would be best to get this into next
  week's release.  That way, the announcement, fix, and mitigating code
  pertaining to this data loss bug all land in the same release.  If 
  necessary,
  I think it would be worth delaying the release, or issuing a new release a
  week or two later, to closely align those events.

 I am not sure if it's a good idea to delay the release because of this,
 there are so many other critical issues that that seems like a bad
 tradeoff.

Fair enough; I'll drop that proposal.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 12:22:16 -0500, Noah Misch wrote:
 On Sat, Nov 30, 2013 at 05:00:58PM +0100, Andres Freund wrote:
  The problem with log_newpage_buffer() is that we'd quite possibly issue
  one such call per item on a page. And that might become quite
  expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
  scary.
 
 I had in mind issuing at most one call per page.  heap_page_prune() has a
 structure conducive to that.

That, at least as far as I can imagine, would make the fix quite
complicated though. In the first phase heap_page_prune() we aren't in a
critical section and cannot modify the buffer yet, so we would make all
the involved code cope with the unfixed xids and hint bits.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
 I was planning to draft up the release notes today.  Can you propose
 text about the above?

 I can, but it will be a couple of hours before I can give it serious
 thought (starving and insanity being serious perils otherwise ;)).

Sure.  I can wait till tomorrow as far as this aspect is concerned.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-30 Thread Andres Freund
On 2013-11-30 11:50:57 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  FWIW, I think the two other data corrupting bugs, incomplete freezing
  due to truncation (all branches) and freezing overall (in 9.3), are at
  least as bad because they take effect on the primary.
  Not saying that because of my involvement, but because I think they need
  to be presented at least as prominent in the release notes.
  They bugs themselves are all fixed in the relevant branches, but I do
  think we need to talk about about how to detect and fix possible
  corruption.
 
 I was planning to draft up the release notes today.  Can you propose
 text about the above?

* Fix possible data corruptions due to incomplete vacuuming (Andres Freund, 
Heikki Linnakangas)

Due to this bug (auto-)vacuum could sometimes treat a partial vacuum as
a full table vacuum mistakenly increasing relfrozenxid as a result. This
could happen if it managed to truncate the tail end of the table due to
dead space. Possible consequences are:
* Errors like could not access status of transaction XXX when
  accessing such rows.
* Vanishing rows after more than 2^31 transactions have passed.

Tables in which parts only changing infrequently, while others change
heavily are more likely to be affected.

It is recommended to perform a VACUUM of all tables in all databases
while having vacuum_freeze_table_age set to zero. This will fix latent
corruption but will not be able to fix all errors.

To detect whether a database is possibly affected check wether either
SELECT txid_current()  2^31 returns 'f' or a VACUUM of all tables
with vacuum_freeze_table_age set to zero returns errors. If neither are
the case, the database is safe after performing the VACUUM.

If you think you are suffering from this corruption, please contact the
pgsql-hackers mailing list or your service provider, data is likely to
be recoverable.

Users updating from 9.0.4/8.4.8 or earlier are not affected.

All branches.

Commit: 82b43f7df2036d06b4410721f77512969846b6d0

* Fix possible data corruptions due to several bugs around vacuuming [in the 
9.3 foreign key locks feature] (Andres Freund, Alvaro Herrera)

The VACUUM implementation in 9.3 had several bugs: It removed multixact
xmax values without regard of the importance of contained xids, it did
not remove multixacts if the contained xids were too old and it relied
on hint bits when checking whether a row needed to be frozen which might
not have been set on replicas.

It is unlikely that databases on a primary are affected in which no
VACUUM FREEZE or a VACUUM with a nondefault vacuum_freeze_min_age was
ever executed and in which SELECT relminmxid FROM pg_class WHERE relkind
= 'r' AND NOT oid = 1262 AND NOT relminmxid = 1 returns no rows.

Possible consequences are:
* Duplicated or vanishing rows.
* Errors like could not access status of transaction XXX when
  accessing rows.
* Primary and Standby servers getting out of sync

It is strongly recommended to re-clone all standby servers after
ugprading, especially if full_page_writes was set to false.  On the
primary it recommented to execute a VACUUM of all tables in all
databases after upgrading both the primary and possibly existing
standbys while having vacuum_freeze_table_age set to zero. This will fix
latent corruption on primaries but will not be able to fix all
pre-existing errors.

If you think you are suffering from data loss due this corruption on the
primary, please contact the pgsql-hackers mailing list or your service
provider, some data might be recoverable.

9.3 only, but should be mentioned first as corruption due to this is quite 
likely.

Commit: 2393c7d102368717283d7121a6ea8164e902b011

I had quite a hard time - likely noticeable - to summarize the second
time in easy to understand terms. The interactions are quite
complicated.
We could tell users they don't necessarily need to re-clone standbys if
no xids have been truncated away (txid_current()  2^31, and
datfrozenxid of at least one database = 1), but given the replication
issue that seems like unneccessary confusion.

Questions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-29 Thread Noah Misch
  On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
   With regard to fixing things up, ISTM the best bet is heap_prune_chain()
   so far. That's executed b vacuum and by opportunistic pruning and we
   know we have the appropriate locks there. Looks relatively easy to fix
   up things there. Not sure if there are any possible routes to WAL log
   this but using log_newpage()?
   I am really not sure what the best course of action is :(

Based on subsequent thread discussion, the plan you outline sounds reasonable.
Here is a sketch of the specific semantics of that fixup.  If a HEAPTUPLE_LIVE
tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
in-progress deleter aborts.  Using log_newpage_buffer() seems fine; there's no
need to optimize performance there.  (All the better if we can, with minimal
hacks, convince heap_freeze_tuple() itself to log the right changes.)

I am wary about the performance loss of doing these checks in every
heap_prune_chain() call, for all time.  If it's measurable, can we can shed
the overhead once corrections are done?  Maybe bump the page layout version
and skip the checks for v5 pages?  (Ugh.)

Time is tight to finalize this, but it would be best to get this into next
week's release.  That way, the announcement, fix, and mitigating code
pertaining to this data loss bug all land in the same release.  If necessary,
I think it would be worth delaying the release, or issuing a new release a
week or two later, to closely align those events.  That being said, I'm
prepared to review a patch in this area over the weekend.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Heikki Linnakangas

On 11/27/13 01:21, Andres Freund wrote:

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.


I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats-scanned_pages 
vacrelstats-rel_pages?


Hmm, you did (scan_all || vacrelstats-scanned_pages  
vacrelstats-rel_pages) for relminmxid, and just 
(vacrelstats-scanned_pages  vacrelstats-rel_pages) for relfrozenxid. 
That was probably not what you meant to do, the thing you did for 
relfrozenxid looks good to me.


Does the attached look correct to you?

- Heikki
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6778c7d..6688ab3 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -178,7 +178,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	int			usecs;
 	double		read_rate,
 write_rate;
-	bool		scan_all;
+	bool		scan_all;		/* should we scan all pages? */
+	bool		scanned_all;	/* did we actually scan all pages? */
 	TransactionId freezeTableLimit;
 	BlockNumber new_rel_pages;
 	double		new_rel_tuples;
@@ -226,6 +227,21 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/*
+	 * Compute whether we actually scanned the whole relation. If we did, we
+	 * can adjust relfrozenxid and relminmxid.
+	 *
+	 * NB: We need to check this before truncating the relation, because that
+	 * will change -rel_pages.
+	 */
+	if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
+	{
+		Assert(!scan_all);
+		scanned_all = false;
+	}
+	else
+		scanned_all = true;
+
+	/*
 	 * Optionally truncate the relation.
 	 *
 	 * Don't even think about it unless we have a shot at releasing a goodly
@@ -254,8 +270,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	 * is all-visible we'd definitely like to know that.  But clamp the value
 	 * to be not more than what we're setting relpages to.
 	 *
-	 * Also, don't change relfrozenxid if we skipped any pages, since then we
-	 * don't know for certain that all tuples have a newer xmin.
+	 * Also, don't change relfrozenxid/relminmxid if we skipped any pages,
+	 * since then we don't know for certain that all tuples have a newer xmin.
 	 */
 	new_rel_pages = vacrelstats-rel_pages;
 	new_rel_tuples = vacrelstats-new_rel_tuples;
@@ -269,13 +285,8 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	if (new_rel_allvisible  new_rel_pages)
 		new_rel_allvisible = new_rel_pages;
 
-	new_frozen_xid = FreezeLimit;
-	if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
-		new_frozen_xid = InvalidTransactionId;
-
-	new_min_multi = MultiXactCutoff;
-	if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
-		new_min_multi = InvalidMultiXactId;
+	new_frozen_xid = scanned_all ? FreezeLimit : InvalidTransactionId;
+	new_min_multi = scanned_all ? MultiXactCutoff : InvalidMultiXactId;
 
 	vac_update_relstats(onerel,
 		new_rel_pages,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Andres Freund
On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:
 On 11/27/13 01:21, Andres Freund wrote:
 On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
 This seems to be the case since
 b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
 scan_all to determine whether we can set new_frozen_xid. That's a slight
 pessimization when we scan a relation fully without explicitly scanning
 it in its entirety, but given this isn't the first bug around
 scanned_pages/rel_pages I'd rather go that way. The aforementioned
 commit wasn't primarily concerned with that.
 Alternatively we could just compute new_frozen_xid et al before the
 lazy_truncate_heap.
 
 I've gone for the latter in this preliminary patch. Not increasing
 relfrozenxid after an initial data load seems like a bit of a shame.
 
 I wonder if we should just do scan_all || vacrelstats-scanned_pages 
 vacrelstats-rel_pages?
 
 Hmm, you did (scan_all || vacrelstats-scanned_pages 
 vacrelstats-rel_pages) for relminmxid, and just (vacrelstats-scanned_pages
  vacrelstats-rel_pages) for relfrozenxid. That was probably not what you
 meant to do, the thing you did for relfrozenxid looks good to me.

I said it's a preliminary patch ;), really, I wasn't sure what of both
to go for.

 Does the attached look correct to you?

Looks good.

I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids. But
integrating logic to fix things into heap_page_prune() looks somewhat
ugly as well.
Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Heikki Linnakangas

On 11/27/13 11:15, Andres Freund wrote:

On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:

On 11/27/13 01:21, Andres Freund wrote:

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.


I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats-scanned_pages 
vacrelstats-rel_pages?


Hmm, you did (scan_all || vacrelstats-scanned_pages 
vacrelstats-rel_pages) for relminmxid, and just (vacrelstats-scanned_pages
 vacrelstats-rel_pages) for relfrozenxid. That was probably not what you
meant to do, the thing you did for relfrozenxid looks good to me.


I said it's a preliminary patch ;), really, I wasn't sure what of both
to go for.


Does the attached look correct to you?


Looks good.


Ok, committed and backpatched that.


I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids.


Ugh :-(. Running vacuum after the upgrade is the right thing to do to 
prevent further damage, but you're right that it will cause any 
already-wrapped around data to be lost forever. Nasty.



But integrating logic to fix things into heap_page_prune() looks
somewhat ugly as well.


I think any mitigating logic we might add should go into vacuum. It 
should be possible for a DBA to run a command, and after it's finished, 
be confident that you're safe. That means vacuum.



Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.


Hmm. If a page has its visibility-map flag set, but contains a tuple 
that appears to be dead because you've wrapped around, vacuum will give 
a warning:  page containing dead tuples is marked as all-visible in 
relation \%s\ page %u. So I think if a manual VACUUM FREEZE passes 
without giving that warning, that vacuum hasn't destroyed any data. So 
we could advise to take a physical backup of the data directory, and run 
a manual VACUUM FREEZE on all databases. If it doesn't give a warning, 
you're safe from that point onwards. If it does, you'll want to recover 
from an older backup, or try to manually salvage just the lost rows from 
the backup, and re-index. Ugly, but hopefully rare in practice.


Unfortunately that doesn't mean that you haven't already lost some data. 
Wrap-around could've already happened, and vacuum might already have run 
and removed some rows. You'll want to examine your logs and grep for 
that warning.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Andres Freund
On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:
 Ok, committed and backpatched that.

Thanks.

 I wonder if we need to integrate any mitigating logic? Currently the
 corruption may only become apparent long after it occurred, that's
 pretty bad. And instructing people run a vacuum after the ugprade will
 cause the corrupted data being lost if they are already 2^31 xids.
 
 Ugh :-(. Running vacuum after the upgrade is the right thing to do to
 prevent further damage, but you're right that it will cause any
 already-wrapped around data to be lost forever. Nasty.

 But integrating logic to fix things into heap_page_prune() looks
 somewhat ugly as well.
 
 I think any mitigating logic we might add should go into vacuum. It should
 be possible for a DBA to run a command, and after it's finished, be
 confident that you're safe. That means vacuum.

Well, heap_page_prune() is the first thing that's executed by
lazy_scan_heap(), that's why I was talking about it. So anything we do
need to happen in there or before.

 Afaics the likelihood of the issue occuring on non-all-visible pages is
 pretty low, since they'd need to be skipped due to lock contention
 repeatedly.

 Hmm. If a page has its visibility-map flag set, but contains a tuple that
 appears to be dead because you've wrapped around, vacuum will give a
 warning:  page containing dead tuples is marked as all-visible in relation
 \%s\ page %u.

I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.

Independent from this, ISTM we should add a
   else if (PageIsAllVisible(page)  all_visible)
to those checks.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Heikki Linnakangas

On 11/27/13 14:11, Andres Freund wrote:

On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:

Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.



Hmm. If a page has its visibility-map flag set, but contains a tuple that
appears to be dead because you've wrapped around, vacuum will give a
warning:  page containing dead tuples is marked as all-visible in relation
\%s\ page %u.


I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.


It might be a good idea to add such a warning to heap_page_prune(). Or 
also emit the warning in lazy_scan_heap() if heap_page_prune() returned  0.



Independent from this, ISTM we should add a
else if (PageIsAllVisible(page)  all_visible)
to those checks.


Can you elaborate, where should that be added?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Andres Freund
On 2013-11-27 14:45:25 +0200, Heikki Linnakangas wrote:
 On 11/27/13 14:11, Andres Freund wrote:
 I don't think this warning is likely to be hit as the code stands -
 heap_page_prune() et. al. will have removed all dead tuples already,
 right and so has_dead_tuples won't be set.
 
 It might be a good idea to add such a warning to heap_page_prune(). Or also
 emit the warning in lazy_scan_heap() if heap_page_prune() returned  0.

 Independent from this, ISTM we should add a
 else if (PageIsAllVisible(page)  all_visible)
 to those checks.
 
 Can you elaborate, where should that be added?

I was thinking of adding such a warning below
elog(WARNING, page containing dead tuples is marked as all-visible in 
relation \%s\ page %u,..)
but cannot warn against that because GetOldestXmin() can go backwards...

I think it's probably sufficient to set has_dead_tuples = true in the
ItemIdIsDead() branch in lazy_scan_heap(). That should catch relevant
actions from heap_page_prune().

Besides not warning in against deletions from heap_page_prune(), the
current warning logic is also buggy for tables without indexes...

/*
 * If there are no indexes then we can vacuum the page right now
 * instead of doing a second scan.
 */
if (nindexes == 0 
vacrelstats-num_dead_tuples  0)
{
/* Remove tuples from heap */
lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, vmbuffer);
has_dead_tuples = false;

That happens before the
else if (PageIsAllVisible(page)  has_dead_tuples)
check.

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Noah Misch
How would you characterize the chances of this happening with default
*vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
this bug during each of ~10 generations of autovacuum_freeze_max_age before
the old rows actually become invisible.

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
 With regard to fixing things up, ISTM the best bet is heap_prune_chain()
 so far. That's executed b vacuum and by opportunistic pruning and we
 know we have the appropriate locks there. Looks relatively easy to fix
 up things there. Not sure if there are any possible routes to WAL log
 this but using log_newpage()?
 I am really not sure what the best course of action is :(

Maximizing detection is valuable, and the prognosis for automated repair is
poor.  I would want a way to extract tuples having xmin outside the range of
CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page.  At
first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence.  For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually.  A module that offers SELECT * FROM
rows_wrongly_invisible('anytable') would aid manual cleanup efforts.
freeze_if_wrongly_invisible(tid) would be useful, too.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Andres Freund
On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
 How would you characterize the chances of this happening with default
 *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
 this bug during each of ~10 generations of autovacuum_freeze_max_age before
 the old rows actually become invisible.

I think realistically, to actually trigger the bug, it needs to happen
quite a bit more often. But in some workloads it's pretty darn easy to
hit. E.g. if significant parts of the table are regularly deleted, lots,
if not most, of your vacuums will spuriously increase relfrozenxid above
the actual value. Each time only by a small amount, but due to that
small increase there never will be an actual full table vacuum since
freeze_table_age will never even remotely be reached.

The client that made me look into the issue noticed problems on
pg_attribute - presumably because of temporary table usage primarily
affecting the tail end of pg_attribute.

 On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
  With regard to fixing things up, ISTM the best bet is heap_prune_chain()
  so far. That's executed b vacuum and by opportunistic pruning and we
  know we have the appropriate locks there. Looks relatively easy to fix
  up things there. Not sure if there are any possible routes to WAL log
  this but using log_newpage()?
  I am really not sure what the best course of action is :(
 
 Maximizing detection is valuable, and the prognosis for automated repair is
 poor.  I would want a way to extract tuples having xmin outside the range of
 CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
 page.

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

 At first, I supposed we could offer a tool to blindly freeze such tuples.
 However, there's no guarantee that they are in harmony with recent changes to
 the database; transactions that wrongly considered those tuples invisible may
 have made decisions incompatible with their existence.  For example, reviving
 such a tuple could violate a UNIQUE constraint if the user had already
 replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...
The primary reason why I think it might be a good idea to revive
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Noah Misch
On Wed, Nov 27, 2013 at 10:43:05PM +0100, Andres Freund wrote:
 On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
  How would you characterize the chances of this happening with default
  *vacuum_freeze_*_age settings?  Offhand, it seems you would need to 
  encounter
  this bug during each of ~10 generations of autovacuum_freeze_max_age before
  the old rows actually become invisible.
 
 I think realistically, to actually trigger the bug, it needs to happen
 quite a bit more often. But in some workloads it's pretty darn easy to
 hit. E.g. if significant parts of the table are regularly deleted, lots,
 if not most, of your vacuums will spuriously increase relfrozenxid above
 the actual value. Each time only by a small amount, but due to that
 small increase there never will be an actual full table vacuum since
 freeze_table_age will never even remotely be reached.

That makes sense.

  Maximizing detection is valuable, and the prognosis for automated repair is
  poor.  I would want a way to extract tuples having xmin outside the range of
  CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
  page.
 
 I think the likelihood of the problem affecting !all-visible pages is
 close to zero. Each vacuum will try to clean those, so they surely will
 get vacuumed at some point. I think the only way that could happen is if
 the ConditionalLockBufferForCleanup() fails in each vacuum. And that
 seems a bit unlikely.

The page could have sat all-visible (through multiple XID epochs, let's say)
until a recent UPDATE.

  At first, I supposed we could offer a tool to blindly freeze such tuples.
  However, there's no guarantee that they are in harmony with recent changes 
  to
  the database; transactions that wrongly considered those tuples invisible 
  may
  have made decisions incompatible with their existence.  For example, 
  reviving
  such a tuple could violate a UNIQUE constraint if the user had already
  replaced the missing row manually.
 
 Good point, although since they are all on all-visible pages sequential
 scans will currently already find those. It's primarily index scans that
 won't. So it's not really reviving them...

True.  Since a dump/reload of the database would already get the duplicate key
violation, the revival is not making anything clearly worse.  And if we hope
for manual repair, many DBAs just won't do that at all.

 The primary reason why I think it might be a good idea to revive
 automatically is, that an eventual full-table/freeze vacuum will
 currently delete them which seems bad.

Will it?  When the page became all-visible, the tuples were all hinted.  They
will never be considered dead.  Every 2B transactions, they will alternate
between live and not-yet-committed.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-27 Thread Andres Freund
On 2013-11-27 18:18:02 -0500, Noah Misch wrote:
  I think the likelihood of the problem affecting !all-visible pages is
  close to zero. Each vacuum will try to clean those, so they surely will
  get vacuumed at some point. I think the only way that could happen is if
  the ConditionalLockBufferForCleanup() fails in each vacuum. And that
  seems a bit unlikely.
 
 The page could have sat all-visible (through multiple XID epochs, let's say)
 until a recent UPDATE.

Good point.

   At first, I supposed we could offer a tool to blindly freeze such tuples.
   However, there's no guarantee that they are in harmony with recent 
   changes to
   the database; transactions that wrongly considered those tuples invisible 
   may
   have made decisions incompatible with their existence.  For example, 
   reviving
   such a tuple could violate a UNIQUE constraint if the user had already
   replaced the missing row manually.
  
  Good point, although since they are all on all-visible pages sequential
  scans will currently already find those. It's primarily index scans that
  won't. So it's not really reviving them...
 
 True.  Since a dump/reload of the database would already get the duplicate key
 violation, the revival is not making anything clearly worse.  And if we hope
 for manual repair, many DBAs just won't do that at all.

Especially if it involves compiling C code...

  The primary reason why I think it might be a good idea to revive
  automatically is, that an eventual full-table/freeze vacuum will
  currently delete them which seems bad.
 
 Will it?  When the page became all-visible, the tuples were all hinted.  They
 will never be considered dead.  Every 2B transactions, they will alternate
 between live and not-yet-committed.

Good point again. And pretty damn good luck.

Although 9.3+ multixact rows look like they could return _DEAD since
we'll do an TransactionIdDidAbort() (via GetUpdateXid()) and
TransactionIdDidCommit() in there and we don't set XMAX_COMMITTED hint
bits for XMAX_IS_MULTI rows. As an additional problem, once multixact.c
has pruned the old multis away we'll get errors from there on.
But that's less likely to affect many rows.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-26 Thread Andres Freund
Hi,

A customer of ours reported that some columns were missing from
pg_attribute. Investigation showed that they were visible to sequential
but not index scans. Looking closer, the page with the missing
attributes is marked as all-visible, but the xids on the individual rows
were xids more than 2^31 in the past - so, effectively in the future and
invisible.
pg_class.relfrozenxid, pg_database.datfrozenxid looked perfectly normal,
not indicating any wraparound and were well past the xid in the
particular rows.
So evidently, something around freezing has gone wrong. We've haven't
frozen each row, but updated relfrozenxid regardless.

A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
/* Do the vacuuming */
lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
...
if (whatever)
   lazy_truncate_heap(onerel, vacrelstats);
...
new_frozen_xid = FreezeLimit;
if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
new_frozen_xid = InvalidTransactionId;
but lazy_tuncate_heap() does, after it's finished truncating:
vacrelstats-rel_pages = new_rel_pages;

Which means, we might consider a partial vacuum as a vacuum that has
frozen all old rows if just enough pages have been truncated away.

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

This is somewhat nasty :(.

I am not sure how we could fixup the resulting corruption. In some cases
we can check for page-level all-visible bit and fixup the the individual
xids. But it's not guaranteed, although likely, that the page level all
visible bit has been set...

Comments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-26 Thread Robert Haas
On Tue, Nov 26, 2013 at 7:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 This is somewhat nasty :(.

Yeah, that's bad.  Real bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete freezing when truncating a relation during vacuum

2013-11-26 Thread Andres Freund
On 2013-11-26 13:32:44 +0100, Andres Freund wrote:
 A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
 /* Do the vacuuming */
 lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
 ...
 if (whatever)
lazy_truncate_heap(onerel, vacrelstats);
 ...
 new_frozen_xid = FreezeLimit;
 if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
 new_frozen_xid = InvalidTransactionId;
 but lazy_tuncate_heap() does, after it's finished truncating:
 vacrelstats-rel_pages = new_rel_pages;
 
 Which means, we might consider a partial vacuum as a vacuum that has
 frozen all old rows if just enough pages have been truncated away.

repro.sql is a reproducer for the problem.

 This seems to be the case since
 b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
 scan_all to determine whether we can set new_frozen_xid. That's a slight
 pessimization when we scan a relation fully without explicitly scanning
 it in its entirety, but given this isn't the first bug around
 scanned_pages/rel_pages I'd rather go that way. The aforementioned
 commit wasn't primarily concerned with that.
 Alternatively we could just compute new_frozen_xid et al before the
 lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats-scanned_pages 
vacrelstats-rel_pages?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
RESET vacuum_freeze_min_age;
DROP TABLE IF EXISTS vacfailure;
CREATE TABLE vacfailure(id serial primary key, data int);
-- disable autovac so we don't have spurious failures
ALTER TABLE vacfailure SET (AUTOVACUUM_ENABLED = false);

-- insert data and mark as all-visible
INSERT INTO vacfailure(data) SELECT * FROM generate_series(1, 10);
SELECT relname, relfrozenxid, txid_current() FROM pg_class WHERE oid = 
'vacfailure'::regclass;
SELECT min(xmin::text::bigint) FROM vacfailure WHERE NOT xmin = 2;
VACUUM VERBOSE vacfailure;
SELECT min(xmin::text::bigint) FROM vacfailure WHERE NOT xmin = 2;

-- so the test doesn't need to create a couple million xids
SET vacuum_freeze_min_age = 0;
SELECT txid_current();SELECT txid_current();SELECT txid_current();
SELECT txid_current();SELECT txid_current();SELECT txid_current();
SELECT txid_current();SELECT txid_current();SELECT txid_current();

SELECT relname, relfrozenxid, txid_current() FROM pg_class WHERE oid = 
'vacfailure'::regclass;
-- produce dead rows at the end of the relation
BEGIN;
INSERT INTO vacfailure(data) SELECT * FROM generate_series(1, 50);
ROLLBACK;
-- partial vacuum, if row versions in 2213 out of 2655 pages or similar
VACUUM VERBOSE vacfailure;
-- if different than the the last time, the bug has hit
SELECT relname, relfrozenxid, txid_current() FROM pg_class WHERE oid = 
'vacfailure'::regclass;
SELECT min(xmin::text::bigint) FROM vacfailure WHERE NOT xmin = 2;
From 9907880952f40dfd11795625d659ea723352c376 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 26 Nov 2013 23:57:55 +0100
Subject: [PATCH] Don't sometimes incorrectly increase relfrozenxid in vacuums
 that truncate.

(auto-)vacuum recognizes that it can increase relfrozenxid by checking
whether it has processed all pages of a relation and frozen all that
are older than the new relfrozenxid. Unfortunately it performed that
check after truncating the dead pages of the end of the relation and
used the new number of pages to decide whether all pages have been
scanned.

This can lead to relfrozenxid being increased above the actual oldest
xid which in turn can lead to data loss due to xid wraparounds with
some rows suddently missing. This likely has escaped notice so far
because it takes a large number (~2^31) of xids being used to see the
effect, while a full-table vacuum before that would fix the issue.
---
 src/backend/commands/vacuumlazy.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6778c7d..5d9d32f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -226,6 +226,24 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	vac_close_indexes(nindexes, Irel, NoLock);
 
 	/*
+	 * Compute whether we actually scanned the whole relation, if we did, we
+	 * can adjust relfrozenxid.
+	 *
+	 * NB: We need to check before truncating the relation, because that will
+	 * change -rel_pages.
+	 */
+	new_frozen_xid = FreezeLimit;
+	if (vacrelstats-scanned_pages  vacrelstats-rel_pages)
+	{
+		Assert(!scan_all);
+		new_frozen_xid = InvalidTransactionId;
+	}
+
+	new_min_multi = MultiXactCutoff;
+	if (scan_all || vacrelstats-scanned_pages  vacrelstats-rel_pages)
+		new_min_multi = InvalidMultiXactId;
+
+	/*
 	 *