Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas
On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
 While reading around which references to SnapshotData's members exist, I
 once more came about the following tidbit in heapgetpage():
 /*
  * If the all-visible flag indicates that all tuples on the page are
  * visible to everyone, we can skip the per-tuple visibility tests.
  *
  * Note: In hot standby, a tuple that's already visible to all
  * transactions in the master might still be invisible to a read-only
  * transaction in the standby. We partly handle this problem by 
 tracking
  * the minimum xmin of visible tuples as the cut-off XID while 
 marking a
  * page all-visible on master and WAL log that along with the 
 visibility
  * map SET operation. In hot standby, we wait for (or abort) all
  * transactions that can potentially may not see one or more tuples 
 on the
  * page. That's how index-only scans work fine in hot standby. A 
 crucial
  * difference between index-only scans and heap scans is that the
  * index-only scan completely relies on the visibility map where as 
 heap
  * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure 
 if
  * the page-level flag can be trusted in the same way, because it 
 might
  * get propagated somehow without being explicitly WAL-logged, e.g. 
 via a
  * full page write. Until we can prove that beyond doubt, let's check 
 each
  * tuple for visibility the hard way.
  */
 all_visible = PageIsAllVisible(dp)  !snapshot-takenDuringRecovery;

 I don't think this is neccessary = 9.2. The are two only interestings place
 where PD_ALL_VISIBLE is set:
 a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
PD_ALL_VISIBLE/the vm is touched and that causes recovery
conflicts. The heap page is locked for cleanup at that point. As the
logging of xl_heap_clean sets the page's LSN there's no way the page
can appear on the standby too early.
 b) empty pages in lazy_scan_heap(). If they always were empty, there's
no need for conflicts. The only other way I can see to end up there
is a previous heap_page_prune() that repaired fragmentation. But that
logs a WAL record with conflict information.

I don't think there's any reason to believe that lazy_scan_heap() can
only hit pages that are empty or have just been defragged.  Suppose
that there's a tuple on the page which was recently inserted; the
inserting transaction has committed but there are some backends that
still have older snapshots.  The page won't be marked all-visible
because it isn't.  Now, eventually those older snapshots will go away,
and sometime after that the relation will get vacuumed again, and
we'll once again look the page.  But this time we notice that it is
all-visible, and mark it so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Andres Freund
On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  While reading around which references to SnapshotData's members exist, I
  once more came about the following tidbit in heapgetpage():
  /*
   * If the all-visible flag indicates that all tuples on the page are
   * visible to everyone, we can skip the per-tuple visibility tests.
   *
   * Note: In hot standby, a tuple that's already visible to all
   * transactions in the master might still be invisible to a 
  read-only
   * transaction in the standby. We partly handle this problem by 
  tracking
   * the minimum xmin of visible tuples as the cut-off XID while 
  marking a
   * page all-visible on master and WAL log that along with the 
  visibility
   * map SET operation. In hot standby, we wait for (or abort) all
   * transactions that can potentially may not see one or more tuples 
  on the
   * page. That's how index-only scans work fine in hot standby. A 
  crucial
   * difference between index-only scans and heap scans is that the
   * index-only scan completely relies on the visibility map where as 
  heap
   * scan looks at the page-level PD_ALL_VISIBLE flag. We are not 
  sure if
   * the page-level flag can be trusted in the same way, because it 
  might
   * get propagated somehow without being explicitly WAL-logged, e.g. 
  via a
   * full page write. Until we can prove that beyond doubt, let's 
  check each
   * tuple for visibility the hard way.
   */
  all_visible = PageIsAllVisible(dp)  
  !snapshot-takenDuringRecovery;
 
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.
 
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

Hm, right, that can happen. How about adding a LSN interlock in
visibilitymap_set() for those cases as well, not just for checksums?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas
On Mon, Mar 3, 2014 at 7:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

 Hm, right, that can happen. How about adding a LSN interlock in
 visibilitymap_set() for those cases as well, not just for checksums?

Well, if I'm correctly understanding what you're proposing, that would
involve emitting an FPI for each page when we vacuum the
newly-inserted portion of an insert-only table.  That's been
repeatedly proposed in the past, but I've opposed it on the grounds
that it makes vacuum much more expensive in such cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Andres Freund
On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.
 
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

Right now I am missing how this isn't an actual correctness problem
after a crash. Without an LSN interlock we could crash *after* the heap
page has been written out, but *before* the vm WAL record has been
flushed to disk. Combined with synchronous_commit=off there could be
transactions that appeared as safely committed for vacuum (i.e. are
below GetOldestXmin()), but which are actually aborted after the
commit.
Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
but that doesn't work here.

Am I missing something?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas
On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.

 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

 Right now I am missing how this isn't an actual correctness problem
 after a crash. Without an LSN interlock we could crash *after* the heap
 page has been written out, but *before* the vm WAL record has been
 flushed to disk.

Yes.  In that case, the PD_ALL_VISIBLE bit will be set on the page,
but the corresponding visibility map bit will be unset.

 Combined with synchronous_commit=off there could be
 transactions that appeared as safely committed for vacuum (i.e. are
 below GetOldestXmin()), but which are actually aborted after the
 commit.
 Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
 but that doesn't work here.

Well, we'd better not try to mark a page all-visible if it's only
all-visible on the assumption that unwritten xlog will be successfully
flushed to disk.  But lazy_scan_heap() has code that only regards the
tuple as all-visible once the tuple is hinted committed, and there's
code elsewhere to keep hint bits from being set too early.  And
heap_page_is_all_visible() follows the same pattern.  So I think it's
OK, but maybe you see something I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-02 Thread Andres Freund
Hi,

I am currently playing around with Robert's suggestion to get rid of
changeset extraction's reusage of SnapshotData fields (basically that
xip contains committed, not uncommited transactions) by using NodeTag
similar to many other (families of) structs.

While reading around which references to SnapshotData's members exist, I
once more came about the following tidbit in heapgetpage():
/*
 * If the all-visible flag indicates that all tuples on the page are
 * visible to everyone, we can skip the per-tuple visibility tests.
 *
 * Note: In hot standby, a tuple that's already visible to all
 * transactions in the master might still be invisible to a read-only
 * transaction in the standby. We partly handle this problem by tracking
 * the minimum xmin of visible tuples as the cut-off XID while marking a
 * page all-visible on master and WAL log that along with the visibility
 * map SET operation. In hot standby, we wait for (or abort) all
 * transactions that can potentially may not see one or more tuples on 
the
 * page. That's how index-only scans work fine in hot standby. A crucial
 * difference between index-only scans and heap scans is that the
 * index-only scan completely relies on the visibility map where as heap
 * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure if
 * the page-level flag can be trusted in the same way, because it might
 * get propagated somehow without being explicitly WAL-logged, e.g. via 
a
 * full page write. Until we can prove that beyond doubt, let's check 
each
 * tuple for visibility the hard way.
 */
all_visible = PageIsAllVisible(dp)  !snapshot-takenDuringRecovery;

I don't think this is neccessary = 9.2. The are two only interestings place
where PD_ALL_VISIBLE is set:
a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
   PD_ALL_VISIBLE/the vm is touched and that causes recovery
   conflicts. The heap page is locked for cleanup at that point. As the
   logging of xl_heap_clean sets the page's LSN there's no way the page
   can appear on the standby too early.
b) empty pages in lazy_scan_heap(). If they always were empty, there's
   no need for conflicts. The only other way I can see to end up there
   is a previous heap_page_prune() that repaired fragmentation. But that
   logs a WAL record with conflict information.

So, we could just remove this?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers