Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Amit Kapila
On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapila  wrote:
> On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
 This is clearly an oversight in Simon's patch fafa374f2, which introduced
 this code without any consideration for the possibility that the page
 doesn't have a valid special area. ...
 but I'm not very clear on whether this is a safe fix, mainly because
 I don't understand what the reuse WAL record really accomplishes.
 Maybe we need to instead generate a reuse record with some special
 transaction ID indicating worst-case assumptions?
>>
>>> I think it is basically to ensure that the queries on standby should
>>> not be considering the page being recycled as valid and if there is
>>> any pending query to which this page is visible, cancel it.  If this
>>> understanding is correct, then I think the fix you are proposing is
>>> correct.
>>
>> After thinking about it some more, I do not understand why we are emitting
>> WAL here at all in *any* case, either for a new page or for a deleted one
>> that we're about to recycle.  Why wouldn't the appropriate actions have
>> been taken when the page was deleted, or even before that when its last
>> tuple was deleted?
>>
>
> It seems to me that we do take actions for conflict resolution during
> the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
> which we emit in vacuum), but not sure if that is sufficient.
> Consider a case where the new transaction is started on standby after
>

Here by new transaction, I intend to say some newer snapshot with
valid MyPgXact->xmin.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Amit Kapila
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
>>> This is clearly an oversight in Simon's patch fafa374f2, which introduced
>>> this code without any consideration for the possibility that the page
>>> doesn't have a valid special area. ...
>>> but I'm not very clear on whether this is a safe fix, mainly because
>>> I don't understand what the reuse WAL record really accomplishes.
>>> Maybe we need to instead generate a reuse record with some special
>>> transaction ID indicating worst-case assumptions?
>
>> I think it is basically to ensure that the queries on standby should
>> not be considering the page being recycled as valid and if there is
>> any pending query to which this page is visible, cancel it.  If this
>> understanding is correct, then I think the fix you are proposing is
>> correct.
>
> After thinking about it some more, I do not understand why we are emitting
> WAL here at all in *any* case, either for a new page or for a deleted one
> that we're about to recycle.  Why wouldn't the appropriate actions have
> been taken when the page was deleted, or even before that when its last
> tuple was deleted?
>

It seems to me that we do take actions for conflict resolution during
the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO
which we emit in vacuum), but not sure if that is sufficient.
Consider a case where the new transaction is started on standby after
page deletion and the same still precedes the value of xact on page,
such transactions must be cancelled before we can reuse the page.  I
think the fact that before recycling the page we do ensure that no
transaction is interested in that page in master indicates something
similar is required in standby as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-25 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
>> This is clearly an oversight in Simon's patch fafa374f2, which introduced
>> this code without any consideration for the possibility that the page
>> doesn't have a valid special area. ...
>> but I'm not very clear on whether this is a safe fix, mainly because
>> I don't understand what the reuse WAL record really accomplishes.
>> Maybe we need to instead generate a reuse record with some special
>> transaction ID indicating worst-case assumptions?

> I think it is basically to ensure that the queries on standby should
> not be considering the page being recycled as valid and if there is
> any pending query to which this page is visible, cancel it.  If this
> understanding is correct, then I think the fix you are proposing is
> correct.

After thinking about it some more, I do not understand why we are emitting
WAL here at all in *any* case, either for a new page or for a deleted one
that we're about to recycle.  Why wouldn't the appropriate actions have
been taken when the page was deleted, or even before that when its last
tuple was deleted?

In other words, if I'm left to fix it, I will do so by reverting
fafa374f2 lock stock and barrel.  I think it was ill-considered and
unnecessary.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-24 Thread Amit Kapila
On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila  wrote:
>>> I think you have a valid point.  It seems we don't need to write WAL
>>> for reuse page (aka don't call _bt_log_reuse_page()), if the page is
>>> new, as the only purpose of that log is to handle conflict based on
>>> transaction id stored in special area which will be anyway zero.
>
>> +1.
>
> This is clearly an oversight in Simon's patch fafa374f2, which introduced
> this code without any consideration for the possibility that the page
> doesn't have a valid special area.  We could prevent the crash by
> doing nothing if PageIsNew, a la
>
> if (_bt_page_recyclable(page))
> {
> /*
>  * If we are generating WAL for Hot Standby then create a
>  * WAL record that will allow us to conflict with queries
>  * running on standby.
>  */
> -   if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
> +   if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) &&
> +   !PageIsNew(page))
> {
> BTPageOpaque opaque = (BTPageOpaque) 
> PageGetSpecialPointer(page);
>
> _bt_log_reuse_page(rel, blkno, opaque->btpo.xact);
> }
>
> /* Okay to use page.  Re-initialize and return it */
>
> but I'm not very clear on whether this is a safe fix, mainly because
> I don't understand what the reuse WAL record really accomplishes.
> Maybe we need to instead generate a reuse record with some special
> transaction ID indicating worst-case assumptions?
>

I think it is basically to ensure that the queries on standby should
not be considering the page being recycled as valid and if there is
any pending query to which this page is visible, cancel it.  If this
understanding is correct, then I think the fix you are proposing is
correct.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-22 Thread Tom Lane
Robert Haas  writes:
> On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila  wrote:
>> I think you have a valid point.  It seems we don't need to write WAL
>> for reuse page (aka don't call _bt_log_reuse_page()), if the page is
>> new, as the only purpose of that log is to handle conflict based on
>> transaction id stored in special area which will be anyway zero.

> +1.

This is clearly an oversight in Simon's patch fafa374f2, which introduced
this code without any consideration for the possibility that the page
doesn't have a valid special area.  We could prevent the crash by
doing nothing if PageIsNew, a la

if (_bt_page_recyclable(page))
{
/*
 * If we are generating WAL for Hot Standby then create a
 * WAL record that will allow us to conflict with queries
 * running on standby.
 */
-   if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
+   if (XLogStandbyInfoActive() && RelationNeedsWAL(rel) &&
+   !PageIsNew(page))
{
BTPageOpaque opaque = (BTPageOpaque) 
PageGetSpecialPointer(page);

_bt_log_reuse_page(rel, blkno, opaque->btpo.xact);
}

/* Okay to use page.  Re-initialize and return it */

but I'm not very clear on whether this is a safe fix, mainly because
I don't understand what the reuse WAL record really accomplishes.
Maybe we need to instead generate a reuse record with some special
transaction ID indicating worst-case assumptions?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-19 Thread Robert Haas
On Mon, Sep 19, 2016 at 7:07 AM, Amit Kapila  wrote:
>> Of course, the database could have been corrupted after having encountered
>> many crashes during my experiments. Neverthelesss, even without in-depth
>> knowledge of the b-tree code I suspect that this stack trace might reflect a
>> legal situation. In partcular, if _bt_page_recyclable() returned on this
>> condition:
>>
>> if (PageIsNew(page))
>> return true;
>>
>
> I think you have a valid point.  It seems we don't need to write WAL
> for reuse page (aka don't call _bt_log_reuse_page()), if the page is
> new, as the only purpose of that log is to handle conflict based on
> transaction id stored in special area which will be anyway zero.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possibly too stringent Assert() in b-tree code

2016-09-19 Thread Amit Kapila
On Mon, Sep 19, 2016 at 1:32 PM, Antonin Houska  wrote:
> I've recently seen this when using 9.6:
>
> #0  0x7f147892f0c7 in raise () from /lib64/libc.so.6
> #1  0x7f1478930478 in abort () from /lib64/libc.so.6
> #2  0x009683a1 in ExceptionalCondition (conditionName=0x9f2ef8 
> "!(((PageHeader) (page))->pd_special >= (__builtin_offsetof (PageHeaderData, 
> pd_linp)))", errorType=0x9f2e8a "FailedAssertion",
> fileName=0x9f2e60 "../../../../src/include/storage/bufpage.h", 
> lineNumber=314) at assert.c:54
> #3  0x004ee3d2 in PageValidateSpecialPointer (page=0x7f14700f3480 "") 
> at ../../../../src/include/storage/bufpage.h:314
> #4  0x004ef913 in _bt_getbuf (rel=0x7f14795a1a50, blkno=11, access=2) 
> at nbtpage.c:629
> #5  0x004eae8c in _bt_split (rel=0x7f14795a1a50, buf=136, cbuf=0, 
> firstright=367, newitemoff=408, newitemsz=16, newitem=0x2608330, 
> newitemonleft=0 '\000') at nbtinsert.c:986
> #6  0x004ea563 in _bt_insertonpg (rel=0x7f14795a1a50, buf=136, 
> cbuf=0, stack=0x26090c8, itup=0x2608330, newitemoff=408, split_only_page=0 
> '\000') at nbtinsert.c:781
> #7  0x004e954c in _bt_doinsert (rel=0x7f14795a1a50, itup=0x2608330, 
> checkUnique=UNIQUE_CHECK_YES, heapRel=0x25aa0a0) at nbtinsert.c:204
> #8  0x004f3b73 in btinsert (rel=0x7f14795a1a50, 
> values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", ht_ctid=0x260927c, 
> heapRel=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES) at nbtree.c:279
> #9  0x004e7964 in index_insert (indexRelation=0x7f14795a1a50, 
> values=0x7ffe46c6f7f0, isnull=0x7ffe46c6f7d0 "", heap_t_ctid=0x260927c, 
> heapRelation=0x25aa0a0, checkUnique=UNIQUE_CHECK_YES)
> at indexam.c:204
>
> Of course, the database could have been corrupted after having encountered
> many crashes during my experiments. Neverthelesss, even without in-depth
> knowledge of the b-tree code I suspect that this stack trace might reflect a
> legal situation. In partcular, if _bt_page_recyclable() returned on this
> condition:
>
> if (PageIsNew(page))
> return true;
>

I think you have a valid point.  It seems we don't need to write WAL
for reuse page (aka don't call _bt_log_reuse_page()), if the page is
new, as the only purpose of that log is to handle conflict based on
transaction id stored in special area which will be anyway zero.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers