Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-03 Thread Robert Haas
On Fri, Mar 31, 2017 at 10:02 PM, Ashutosh Sharma  wrote:
> As suggested, I am now acquiring lock inside the caller function.
> Attached is the patch having this changes. Thanks.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Ashutosh Sharma
On Apr 1, 2017 18:11, "Amit Kapila"  wrote:

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma 
wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.
>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)
>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.  I think it's fine, though.
>

One thing to note here is that this fix won't be required if we get
the page-scan-mode patch [1] in this CF.  I think if we fix this with
the patch proposed by Ashutosh, then we anyway needs to again change
the related code (kind of revert the fix) after page-scan-mode patch.
Now, if you think we have negligible chance of getting that patch,
then it is good to proceed with this fix.


Yes, I had already mentioned this point in my very first mail. Thanks for
highlighting this once again.

[1] - https://commitfest.postgresql.org/13/999/


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Amit Kapila
On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  
> wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.
>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)
>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.  I think it's fine, though.
>

One thing to note here is that this fix won't be required if we get
the page-scan-mode patch [1] in this CF.  I think if we fix this with
the patch proposed by Ashutosh, then we anyway needs to again change
the related code (kind of revert the fix) after page-scan-mode patch.
Now, if you think we have negligible chance of getting that patch,
then it is good to proceed with this fix.

[1] - https://commitfest.postgresql.org/13/999/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Ashutosh Sharma
Hi Andreas,

On Apr 1, 2017 16:15, "Andreas Seltenreich"  wrote:

Andreas Seltenreich writes:
>>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> I got about one TRAP per hour when testing on 20 nodes with one postgres
> and 5 sqlsmithes on each.

> Ashutosh Sharma writes:
>> [2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.



Thanks for testing my patch. Just FYI, I have slightly changed my patch
this morning as per Robert's suggestions. But, still more or less the logic
remains the same. Thank you once again.


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Andreas Seltenreich
Andreas Seltenreich writes:
>>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
>>> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> I got about one TRAP per hour when testing on 20 nodes with one postgres
> and 5 sqlsmithes on each.

> Ashutosh Sharma writes:
>> [2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
Hi,

On Sat, Apr 1, 2017 at 7:06 AM, Ashutosh Sharma  wrote:
> On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
>> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  
>> wrote:
>>> Well, That is another option but the main idea was to be inline with
>>> the btree code.
>>
>> That's not a bad goal in principal, but _bt_killitems() doesn't have
>> any similar argument.
>
> It was there but later got removed with some patch (may be the patch
> for reducing pinning and buffer content lock for btree scans). The
> following commit has this changes.
>
> commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
> Author: Tom Lane 
> Date:   Sun May 7 01:21:30 2006 +
>
>>
>> (Also, speaking of consistency, why did we end up with
>> _hash_kill_items, with an underscore between kill and items, and
>> _bt_killitems, without one?)
>
> That is just to follow the naming convention in hash.h Please check
> the review comments for this at - [1].
>
>>
>>> Moreover, I am not sure if acquiring lwlock inside
>>> hashendscan (basically the place where we are trying to close down the
>>> things) would look good.
>>
>> Well, if that's not a good thing to do, hiding it inside some other
>> function doesn't make it better.
>
> okay, agreed. I will submit the patch very shortly.

As suggested, I am now acquiring lock inside the caller function.
Attached is the patch having this changes. Thanks.

>
>
> [1] - 
> https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..b835f77 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -476,9 +476,17 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 
@@ -507,9 +515,17 @@ hashendscan(IndexScanDesc scan)
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  
> wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.

It was there but later got removed with some patch (may be the patch
for reducing pinning and buffer content lock for btree scans). The
following commit has this changes.

commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
Author: Tom Lane 
Date:   Sun May 7 01:21:30 2006 +

>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)

That is just to follow the naming convention in hash.h Please check
the review comments for this at - [1].

>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.

okay, agreed. I will submit the patch very shortly.


[1] - 
https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  wrote:
> Well, That is another option but the main idea was to be inline with
> the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

> Moreover, I am not sure if acquiring lwlock inside
> hashendscan (basically the place where we are trying to close down the
> things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better.  I think it's fine, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
On Sat, Apr 1, 2017 at 1:08 AM, Robert Haas  wrote:
> On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma  
> wrote:
>> Thanks for reporting this problem. Could you please let me know on for
>> how long did you run sqlsmith to get this crash. However, I have found
>> the reason for this crash. This is basically happening when trying to
>> retrieve the tuples using cursor. Basically the current hash index
>> scan work tuple-at-a-time which means once it finds tuple on page, it
>> releases lock from the page but keeps pin on it and finally returns
>> the tuple. When the requested number of tuples are processed there is
>> no lock on the page that was being scanned but yes there is a pin on
>> it. Finally, when trying to close a cursor at the end of scan, if any
>> killed tuples has been identified we try to first mark these items as
>> dead with the help of _hash_kill_items(). But, since we only have pin
>> on this page, the assert check 'LWLockHeldByMe()' fails.
>
> Instead of adding bool haveLock to _hash_kill_items, how about just
> having the caller acquire the lock before calling the function and
> release it afterwards?  Since the acquire is at the beginning of the
> function and the release at the end, there seems to be no point in
> putting the acquire/release inside the function rather than in the
> caller.

Well, That is another option but the main idea was to be inline with
the btree code. Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Robert Haas
On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma  wrote:
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash. However, I have found
> the reason for this crash. This is basically happening when trying to
> retrieve the tuples using cursor. Basically the current hash index
> scan work tuple-at-a-time which means once it finds tuple on page, it
> releases lock from the page but keeps pin on it and finally returns
> the tuple. When the requested number of tuples are processed there is
> no lock on the page that was being scanned but yes there is a pin on
> it. Finally, when trying to close a cursor at the end of scan, if any
> killed tuples has been identified we try to first mark these items as
> dead with the help of _hash_kill_items(). But, since we only have pin
> on this page, the assert check 'LWLockHeldByMe()' fails.

Instead of adding bool haveLock to _hash_kill_items, how about just
having the caller acquire the lock before calling the function and
release it afterwards?  Since the acquire is at the beginning of the
function and the release at the end, there seems to be no point in
putting the acquire/release inside the function rather than in the
caller.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma
On Mar 28, 2017 00:00, "Andreas Seltenreich"  wrote:

Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.


Okay. Thanks for sharing this information. I tried running running sqlsmith
on a single node. I ran it for an hour but didn't find any crash. Let me
also try running multiple sqlsmith at a time...May be 5 like you are doing
on a single node.


> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.


Sure, Thanks a lot.

With Regards,
Ashutosh Sharma


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Andreas Seltenreich
Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
>> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma
Hi,

> testing with master as of cf366e97ff, sqlsmith occasionally triggers the
> following assertion:
>
> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
>
> Backtraces always look like the one below.  It is reproducible on a
> cluster once it happens.  I could provide a tarball if needed.
>
> regards,
> Andreas
>
> #2  0x008324b1 in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", errorType=errorType@entry=0x87b03d 
> "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
> lineNumber=lineNumber@entry=3397) at assert.c:54
> #3  0x00706971 in MarkBufferDirtyHint (buffer=2844, 
> buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
> #4  0x004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
> hashutil.c:514
> #5  0x004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
> #6  0x004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
> #7  0x0061fa51 in ExecEndIndexScan (node=0x3093f30) at 
> nodeIndexscan.c:852
> #8  0x00608e59 in ExecEndNode (node=) at 
> execProcnode.c:715
> #9  0x006045b8 in ExecEndPlan (estate=0x3064000, planstate= out>) at execMain.c:1540
> #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
> #11 0x005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
> #12 0x0085cbb3 in PortalDrop (portal=0x1a60060, 
> isTopCommit=) at portalmem.c:489
> #13 0x00736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
> postgres.c:
> #14 0x00738b51 in PostgresMain (argc=, 
> argv=argv@entry=0x1a6c6c8, dbname=, username=) 
> at postgres.c:4071
> #15 0x00475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
> #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
> #17 ServerLoop () at postmaster.c:1729
> #18 0x006c8662 in PostmasterMain (argc=argc@entry=4, 
> argv=argv@entry=0x1a3f540) at postmaster.c:1337
> #19 0x0047729d in main (argc=4, argv=0x1a3f540) at main.c:228
>


Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1].

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..0bacef8 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -478,7 +478,7 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
@@ -509,7 +509,7 @@ hashendscan(IndexScanDesc scan)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 2d92049..414cc6a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -467,7 +467,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, true);
 
 	/*
 	 * ran off the end of this page, try the next
@@ -524,7 +524,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+