Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Aug 24, 2016 at 11:44 AM, Mark Kirkwoodwrote: > On 24/08/16 17:01, Mark Kirkwood wrote: >> >> >> ...actually I was wrong there, only 2 of them were the same. So I've >> attached a new log of bt's of them all. >> >> >> > > And I can reproduce with only 1 session, figured that might be a helpful > piece of the puzzle (trace attached). > Thanks. I think I know the problem here. Basically _hash_freeovflpage() is trying to take a lock on a buffer previous to overflow page to update the links and it is quite possible that the same buffer is already locked for moving the tuples while squeezing the bucket. I am working on a fix for the same. Coincidently, Ashutosh Sharma a colleague of mine who was also testing this patch found the same issue by an attached sql script. So we might be able to inculcate a test case in the regression suite as well after fix. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com test_hash.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 17:01, Mark Kirkwood wrote: ...actually I was wrong there, only 2 of them were the same. So I've attached a new log of bt's of them all. And I can reproduce with only 1 session, figured that might be a helpful piece of the puzzle (trace attached). $ pgbench -c 1 -T600 bench bench=# SELECT pid,state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++--- - 17266 | active | 00:45:23.027555 | LWLockTranche | buffer_content | INSERT INTO pgbenc h_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAM P); (1 row) $ gdb -p 17266 (gdb) bt #0 0x7f7b4d533387 in semop () at ../sysdeps/unix/syscall-template.S:84 #1 0x00660331 in PGSemaphoreLock (sema=sema@entry=0x7f7b4cad1690) at pg_sema.c:387 #2 0x006bde9b in LWLockAcquire (lock=0x7f7b44583fe4, mode=LW_EXCLUSIVE) at lwlock.c:1288 #3 0x0049a625 in _hash_getbuf_with_strategy (rel=rel@entry=0x7f7b4e0c1908, blkno=blkno@entry=12, access=access@entry=2, flags=flags@entry=1, bstrategy=bstrategy@entry=0x0) at hashpage.c:252 #4 0x00498040 in _hash_freeovflpage (rel=rel@entry=0x7f7b4e0c1908, bucketbuf=bucketbuf@entry=12796, ovflbuf=ovflbuf@entry=236, wbuf=wbuf@entry=13166, itups=itups@entry=0x7ffc693fc800, itup_offsets=itup_offsets@entry=0x13067d0, tups_size=0x7ffc693fd4c0, nitups=34, bstrategy=0x0) at hashovfl.c:517 #5 0x00499985 in _hash_squeezebucket (rel=rel@entry=0x7f7b4e0c1908, bucket=bucket@entry=7, bucket_blkno=bucket_blkno@entry=10, bucket_buf=bucket_buf@entry=12796, bstrategy=bstrategy@entry=0x0) at hashovfl.c:1010 #6 0x00496caf in hashbucketcleanup (rel=rel@entry=0x7f7b4e0c1908, bucket_buf=bucket_buf@entry=12796, bucket_blkno=bucket_blkno@entry=10, bstrategy=bstrategy@entry=0x0, maxbucket=22, highmask=31, lowmask=15, tuples_removed=0x0, num_index_tuples=0x0, bucket_has_garbage=1 '\001', delay=0 '\000', callback=0x0, callback_state=0x0) at hash.c:937 #7 0x0049b353 in _hash_expandtable (rel=rel@entry=0x7f7b4e0c1908, metabuf=metabuf@entry=6179) at hashpage.c:785 #8 0x00497bae in _hash_doinsert (rel=rel@entry=0x7f7b4e0c1908, itup=itup@entry=0x13060b8) at hashinsert.c:313 #9 0x0049617f in hashinsert (rel=0x7f7b4e0c1908, values=, isnull=, ht_ctid=0x1305f7c, heapRel=, checkUnique=) at hash.c:247 #10 0x005bf060 in ExecInsertIndexTuples (slot=slot@entry=0x1304a70, tupleid=tupleid@entry=0x1305f7c, estate=estate@entry=0x13043b0, noDupErr=noDupErr@entry=0 '\000', specConflict=specConflict@entry=0x0, arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:388 #11 0x005dd011 in ExecInsert (canSetTag=1 '\001', estate=0x13043b0, onconflict=, arbiterIndexes=0x0, planSlot=0x1304a70, slot=0x1304a70, mtstate=0x1304600) at nodeModifyTable.c:481 #12 ExecModifyTable (node=node@entry=0x1304600) at nodeModifyTable.c:1496 #13 0x005c3948 in ExecProcNode (node=node@entry=0x1304600) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0x12f4bd8, direction=, numberTuples=0, sendTuples=, operation=CMD_INSERT, use_parallel_mode=, planstate=0x1304600, estate=0x13043b0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0x1303fa0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP);", params=0x0, dest=0x12f4bd8, completionTag=0x7ffc69400c30 "") at pquery.c:187 #17 0x006ce89c in PortalRunMulti (portal=portal@entry=0x1271fe0, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12f4bd8, ---Type to continue, or q to quit--- altdest=altdest@entry=0x12f4bd8, completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:1303 #18 0x006cf34e in PortalRun (portal=portal@entry=0x1271fe0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x12f4bd8, altdest=altdest@entry=0x12f4bd8, completionTag=completionTag@entry=0x7ffc69400c30 "") at pquery.c:815 #19 0x006cc45b in exec_simple_query ( query_string=0x12ccf50 "INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (352, 76, 1305123, -1604, CURRENT_TIMESTAMP);") at postgres.c:1094 #20 PostgresMain (argc=,
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 16:52, Mark Kirkwood wrote: On 24/08/16 16:33, Amit Kapila wrote: On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwoodwrote: On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood wrote: Can you get the call stacks? For every stuck backend? (just double checking)... Yeah. Cool, I managed to reproduce with a reduced workload of 4 backends, then noticed that the traces for 3 of 'em were all the same. So I've attached the 2 unique ones, plus noted what pg_stat_activity thought the wait event was, in case that is useful. ...actually I was wrong there, only 2 of them were the same. So I've attached a new log of bt's of them all. $ pgbench -c 4 -T600 bench postgres=# SELECT pid, state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++- --- 16549 | active | 00:06:00.252146 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, -2566, C URRENT_TIMESTAMP); 16548 | active | 00:06:00.235003 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 1021, CUR RENT_TIMESTAMP); 16547 | active | 00:06:00.218032 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, -35, CUR RENT_TIMESTAMP); 16546 | active | 00:06:00.192543 | Lock| transactionid | UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3; (4 rows) $ gdb 16546 (gdb) bt #0 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x006abdb6 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288) at latch.c:987 #2 WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffd197d3a0, nevents=nevents@entry=1) at latch.c:941 #3 0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347 #4 0x006ac27d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302 #5 0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, lockMethodTable=lockMethodTable@entry=0x9149a0 ) at proc.c:1219 #6 0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, owner=owner@entry=0xdc4cb0) at lock.c:1703 #7 0x006b7a77 in LockAcquireExtended (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998 #8 0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688 #9 0x006b58fe in XactLockTableWait (xid=xid@entry=7667, rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, oper=oper@entry=XLTW_Update) at lmgr.c:587 #10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, otid=otid@entry=0x7fffd197d9e0, newtup=newtup@entry=0xd4b3d0, cid=2, crosscheck=0x0, wait=wait@entry=1 '\001', hufd=0x7fffd197d8f0, lockmode=0x7fffd197d8ec) at heapam.c:3755 #11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, oldtuple=oldtuple@entry=0x0, slot=slot@entry=0xd4adb0, planSlot=planSlot@entry=0xdc8ea0, epqstate=epqstate@entry=0xdc76c8, estate=estate@entry=0xdc73d0, canSetTag=1 '\001') at nodeModifyTable.c:922 #12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620) at nodeModifyTable.c:1501 #13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, completionTag=0x7fffd197dfc0 "") at pquery.c:187 #17 0x006ce89c in PortalRunMulti (portal=portal@entry=0xd35000, isTopLevel=isTopLevel@entry=1 '\001',
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janeswrote: > Hi Amit, > > Thanks for working on this. > > When building with --enable-cassert, I get compiler warning: > > hash.c: In function 'hashbucketcleanup': > hash.c:722: warning: 'new_bucket' may be used uninitialized in this function > This warning is from concurrent index patch. I will fix it and post the patch on that thread. > > After an intentionally created crash, I get an Assert triggering: > > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & > (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553) > > freep[0] is zero and bitmapbit is 16. > Here what is happening is that when it tries to clear the bitmapbit, it expects it to be set. Now, I think the reason for why it didn't find the bit as set could be that after the new overflow page is added and the bit corresponding to it is set, you might have crashed the system and the replay would not have set the bit. Then while freeing the overflow page it can hit the Assert as mentioned by you. I think the problem here could be that I am using REGBUF_STANDARD to log the bitmap page updates which seems to be causing the issue. As bitmap page doesn't follow the standard page layout, it would have omitted the actual contents while taking full page image and then during replay, it would not have set the bit, because page doesn't need REDO. I think here the fix is to use REGBUF_NO_IMAGE as we use for vm buffers. If you can send me the detailed steps for how you have produced the problem, then I can verify after fixing whether you are seeing the same problem or something else. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 16:33, Amit Kapila wrote: On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwoodwrote: On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood wrote: Can you get the call stacks? For every stuck backend? (just double checking)... Yeah. Cool, I managed to reproduce with a reduced workload of 4 backends, then noticed that the traces for 3 of 'em were all the same. So I've attached the 2 unique ones, plus noted what pg_stat_activity thought the wait event was, in case that is useful. Cheers Mark $ pgbench -c 4 -T600 bench postgres=# SELECT pid, state,now()-xact_start AS wait,wait_event_type,wait_event,query FROM pg_stat_activity WHERE datname='bench' ORDER BY wait DESC; pid | state | wait | wait_event_type | wait_event | query ---++-+-++- --- 16549 | active | 00:06:00.252146 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (190, 24, 9513890, -2566, C URRENT_TIMESTAMP); 16548 | active | 00:06:00.235003 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (445, 3, 5688076, 1021, CUR RENT_TIMESTAMP); 16547 | active | 00:06:00.218032 | LWLockTranche | buffer_content | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (803, 97, 6871273, -35, CUR RENT_TIMESTAMP); 16546 | active | 00:06:00.192543 | Lock| transactionid | UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3; (4 rows) postgres@zmori:~$ gdb -p 16546 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 84 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x7f51daa3a133 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x006abdb6 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fffd197d3a0, cur_timeout=-1, set=0xdc9288) at latch.c:987 #2 WaitEventSetWait (set=set@entry=0xdc9288, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fffd197d3a0, nevents=nevents@entry=1) at latch.c:941 #3 0x006ac1ca in WaitLatchOrSocket (latch=0x7f51d9fd969c, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0) at latch.c:347 #4 0x006ac27d in WaitLatch (latch=, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0) at latch.c:302 #5 0x006bb5e3 in ProcSleep (locallock=locallock@entry=0xd3d690, lockMethodTable=lockMethodTable@entry=0x9149a0 ) at proc.c:1219 #6 0x006b639d in WaitOnLock (locallock=locallock@entry=0xd3d690, owner=owner@entry=0xdc4cb0) at lock.c:1703 #7 0x006b7a77 in LockAcquireExtended ( locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000', reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998 #8 0x006b7d51 in LockAcquire (locktag=locktag@entry=0x7fffd197d700, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688 #9 0x006b58fe in XactLockTableWait (xid=xid@entry=7667, rel=rel@entry=0x7f51db5c6398, ctid=ctid@entry=0x7fffd197d844, oper=oper@entry=XLTW_Update) at lmgr.c:587 #10 0x004a5d29 in heap_update (relation=relation@entry=0x7f51db5c6398, otid=otid@entry=0x7fffd197d9e0, newtup=newtup@entry=0xd4b3d0, cid=2, crosscheck=0x0, wait=wait@entry=1 '\001', hufd=0x7fffd197d8f0, lockmode=0x7fffd197d8ec) at heapam.c:3755 #11 0x005dbeaa in ExecUpdate (tupleid=tupleid@entry=0x7fffd197d9e0, oldtuple=oldtuple@entry=0x0, slot=slot@entry=0xd4adb0, planSlot=planSlot@entry=0xdc8ea0, epqstate=epqstate@entry=0xdc76c8, estate=estate@entry=0xdc73d0, canSetTag=1 '\001') at nodeModifyTable.c:922 #12 0x005dc633 in ExecModifyTable (node=node@entry=0xdc7620) at nodeModifyTable.c:1501 #13 0x005c3948 in ExecProcNode (node=node@entry=0xdc7620) at execProcnode.c:396 #14 0x005bfdbf in ExecutePlan (dest=0xdc9e60, direction=, numberTuples=0, sendTuples=, operation=CMD_UPDATE, use_parallel_mode=, planstate=0xdc7620, estate=0xdc73d0) at execMain.c:1567 #15 standard_ExecutorRun (queryDesc=0xdc6fc0, direction=, count=0) at execMain.c:338 #16 0x006ce669 in ProcessQuery (plan=, sourceText=0xd8ff70 "UPDATE pgbench_branches SET bbalance = bbalance + -2823 WHERE bid = 3;", params=0x0, dest=0xdc9e60, completionTag=0x7fffd197dfc0 "") ---Type to continue, or q to quit---
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Aug 24, 2016 at 9:53 AM, Mark Kirkwoodwrote: > On 24/08/16 15:36, Amit Kapila wrote: >> >> On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwood >> wrote: >>> >> >> Can you get the call stacks? >> > > For every stuck backend? (just double checking)... Yeah. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 15:36, Amit Kapila wrote: On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwoodwrote: On 24/08/16 12:09, Mark Kirkwood wrote: On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? Can you get the call stacks? For every stuck backend? (just double checking)... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Aug 24, 2016 at 8:54 AM, Mark Kirkwoodwrote: > On 24/08/16 12:09, Mark Kirkwood wrote: >> >> On 23/08/16 15:24, Amit Kapila wrote: >>> >>> >>> Thoughts? >>> >>> Note - To use this patch, first apply latest version of concurrent >>> hash index patch [2]. >>> >>> [1] - https://commitfest.postgresql.org/10/647/ >>> [2] - >>> https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com >>> >>> >> >> Firstly - really nice! Patches applied easily etc to latest version 10 >> checkout. >> >> I thought I'd test by initializing pgbench schema, adding a standby, then >> adding some hash indexes and running pgbench: >> >> Looking on the standby: >> >> bench=# \d pgbench_accounts >>Table "public.pgbench_accounts" >> Column | Type | Modifiers >> --+---+--- >> aid | integer | not null >> bid | integer | >> abalance | integer | >> filler | character(84) | >> Indexes: >> "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) >> "pgbench_accounts_bid" hash (bid)< >> >> bench=# \d pgbench_history >> Table "public.pgbench_history" >> Column |Type | Modifiers >> +-+--- >> tid| integer | >> bid| integer | >> aid| integer | >> delta | integer | >> mtime | timestamp without time zone | >> filler | character(22) | >> Indexes: >> "pgbench_history_bid" hash (bid) <= >> >> >> they have been replicated there ok. >> >> However I'm seeing a hang on the master after a while: >> >> bench=# SELECT datname,application_name,state,now()-xact_start AS >> wait,query FROM pg_stat_activity ORDER BY wait DESC; >> datname | application_name | state | wait | query >> >> -+--++-+ >> | walreceiver | idle | | >> bench | pgbench | active | 00:31:38.367467 | INSERT INTO >> pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, >> -3868, CURRENT_TIMESTAMP); >> bench | pgbench | active | 00:31:38.215378 | INSERT INTO >> pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, >> 2091, CURRENT_TIMESTAMP); >> bench | pgbench | active | 00:31:35.991056 | INSERT INTO >> pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, >> 3438, CURRENT_TIMESTAMP); >> bench | pgbench | active | 00:31:35.619798 | INSERT INTO >> pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, >> -2443, CURRENT_TIMESTAMP); >> bench | pgbench | active | 00:31:35.544196 | INSERT INTO >> pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, >> CURRENT_TIMESTAMP); >> bench | pgbench | active | 00:31:35.334504 | UPDATE >> pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; >> bench | pgbench | active | 00:31:35.234112 | UPDATE >> pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; >> bench | pgbench | active | 00:31:34.434676 | UPDATE >> pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; >> bench | psql | active | 00:00:00| SELECT >> datname,application_name,state,now()-xact_start AS wait,query FROM >> pg_stat_activity ORDER BY wait DESC; >> (10 rows) >> >> but no errors in the logs, any thoughts? > Can you get the call stacks? > > > FWIW, retesting with the wal logging patch removed (i.e leaving the > concurrent hast one) works fine. > Okay, information noted. Thanks for testing and showing interest in the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 24/08/16 12:09, Mark Kirkwood wrote: On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? FWIW, retesting with the wal logging patch removed (i.e leaving the concurrent hast one) works fine. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 23/08/16 15:24, Amit Kapila wrote: Thoughts? Note - To use this patch, first apply latest version of concurrent hash index patch [2]. [1] - https://commitfest.postgresql.org/10/647/ [2] - https://www.postgresql.org/message-id/caa4ek1lkq_udism-z2dq6cuvjh3db5fnfnnezzbpsrjw0ha...@mail.gmail.com Firstly - really nice! Patches applied easily etc to latest version 10 checkout. I thought I'd test by initializing pgbench schema, adding a standby, then adding some hash indexes and running pgbench: Looking on the standby: bench=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid)< bench=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) <= they have been replicated there ok. However I'm seeing a hang on the master after a while: bench=# SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; datname | application_name | state | wait | query -+--++-+ | walreceiver | idle | | bench | pgbench | active | 00:31:38.367467 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (921, 38, 251973, -3868, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:38.215378 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (280, 95, 3954814, 2091, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.991056 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (447, 33, 8355237, 3438, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.619798 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (134, 16, 4839994, -2443, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.544196 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (37, 73, 9620119, 4053, CURRENT_TIMESTAMP); bench | pgbench | active | 00:31:35.334504 | UPDATE pgbench_branches SET bbalance = bbalance + -2954 WHERE bid = 33; bench | pgbench | active | 00:31:35.234112 | UPDATE pgbench_branches SET bbalance = bbalance + -713 WHERE bid = 38; bench | pgbench | active | 00:31:34.434676 | UPDATE pgbench_branches SET bbalance = bbalance + -921 WHERE bid = 33; bench | psql | active | 00:00:00| SELECT datname,application_name,state,now()-xact_start AS wait,query FROM pg_stat_activity ORDER BY wait DESC; (10 rows) but no errors in the logs, any thoughts? Cheers Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
Hi Amit, Thanks for working on this. When building with --enable-cassert, I get compiler warning: hash.c: In function 'hashbucketcleanup': hash.c:722: warning: 'new_bucket' may be used uninitialized in this function After an intentionally created crash, I get an Assert triggering: TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553) freep[0] is zero and bitmapbit is 16. With this backtrace: (gdb) bt #0 0x003838c325e5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x003838c33dc5 in abort () at abort.c:92 #2 0x0081a8fd in ExceptionalCondition (conditionName=, errorType=, fileName=, lineNumber=) at assert.c:54 #3 0x004a4199 in _hash_freeovflpage (rel=0x7f3f745d86b8, bucketbuf=198, ovflbuf=199, wbuf=198, itups=0x7ffc258fa090, itup_offsets=0x126e8a8, tups_size=0x7ffc258f93d0, nitups=70, bstrategy=0x12ba320) at hashovfl.c:553 #4 0x004a4c32 in _hash_squeezebucket (rel=, bucket=38, bucket_blkno=56, bucket_buf=198, bstrategy=0x12ba320) at hashovfl.c:1010 #5 0x004a042a in hashbucketcleanup (rel=0x7f3f745d86b8, bucket_buf=198, bucket_blkno=56, bstrategy=0x12ba320, maxbucket=96, highmask=127, lowmask=63, tuples_removed=0x7ffc258fc1c8, num_index_tuples=0x7ffc258fc1c0, bucket_has_garbage=0 '\000', delay=1 '\001', callback=0x5e9bd0 , callback_state=0x126e248) at hash.c:937 #6 0x004a07e7 in hashbulkdelete (info=0x7ffc258fc2b0, stats=0x0, callback=0x5e9bd0 , callback_state=0x126e248) at hash.c:580 #7 0x005e98c5 in lazy_vacuum_index (indrel=0x7f3f745d86b8, stats=0x126ecc0, vacrelstats=0x126e248) at vacuumlazy.c:1599 #8 0x005ea7f9 in lazy_scan_heap (onerel=, options=, params=0x12ba290, bstrategy=) at vacuumlazy.c:1291 #9 lazy_vacuum_rel (onerel=, options=, params=0x12ba290, bstrategy=) at vacuumlazy.c:255 #10 0x005e8939 in vacuum_rel (relid=17329, relation=0x7ffc258fcbd0, options=99, params=0x12ba290) at vacuum.c:1399 #11 0x005e8d01 in vacuum (options=99, relation=0x7ffc258fcbd0, relid=, params=0x12ba290, va_cols=0x0, bstrategy=, isTopLevel=1 '\001') at vacuum.c:307 #12 0x006a07f1 in autovacuum_do_vac_analyze () at autovacuum.c:2823 #13 do_autovacuum () at autovacuum.c:2341 #14 0x006a0f9c in AutoVacWorkerMain (argc=, argv=) at autovacuum.c:1656 #15 0x006a1116 in StartAutoVacWorker () at autovacuum.c:1461 #16 0x006afb00 in StartAutovacuumWorker (postgres_signal_arg=) at postmaster.c:5323 #17 sigusr1_handler (postgres_signal_arg=) at postmaster.c:5009 #18 #19 0x003838ce1503 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #20 0x006b0ec0 in ServerLoop (argc=, argv=) at postmaster.c:1657 #21 PostmasterMain (argc=, argv=) at postmaster.c:1301 #22 0x00632e88 in main (argc=4, argv=0x11f4d50) at main.c:228 Cheers, Jeff
Re: [HACKERS] Write Ahead Logging for Hash Indexes
Hi All, Following are the steps that i have followed to verify the WAL Logging of hash index, 1. I used Mithun's patch to improve coverage of hash index code [1] to verify the WAL Logging of hash index. Firstly i have confirmed if all the XLOG records associated with hash index are being covered or not using this patch. In case if any of the XLOG record for hash index operation is not being covered i have added a testcase for it. I have found that one of the XLOG record 'XLOG_HASH_MOVE_PAGE_CONTENTS' was not being covered and added a small testcase for the same. The patch for this is available @ [2]. 2. I executed the regression test suite and found all the hash indexes that are getting created as a part of regression test suite using the below query. SELECT t.relname index_name, t.oid FROM pg_class t JOIN pg_am idx ON idx.oid = t.relam WHERE idx.amname = 'hash'; 3. Thirdly, I have calculated the number of pages associated with each hash index and compared every page of hash index on master and standby server using pg_filedump tool. As for example if the number of pages associated with 'con_hash_index' is 10 then here is what i did, On master: - select pg_relation_filepath('con_hash_index'); pg_relation_filepath -- base/16408/16433 (1 row) ./pg_filedump -if -R 0 9 /home/edb/git-clone-postgresql/postgresql/TMP/postgres/master/base/16408/16433 > /tmp/file1 On Slave: --- select pg_relation_filepath('con_hash_index'); pg_relation_filepath -- base/16408/16433 (1 row) ./pg_filedump -if -R 0 9 /home/edb/git-clone-postgresql/postgresql/TMP/postgres/standby/base/16408/16433 > /tmp/file2 compared file1 and file2 using some diff tool. Following are the list of hash indexes that got created inside regression database when regression test suite was executed on a master server. hash_i4_index hash_name_index hash_txt_index hash_f8_index con_hash_index hash_idx In short, this is all i did and found no issues during testing. Please let me know if you need any further details. I would like to Thank Amit for his support and guidance during the testing phase. [1] - https://www.postgresql.org/message-id/CAA4eK1JOBX%3DYU33631Qh-XivYXtPSALh514%2BjR8XeD7v%2BK3r_Q%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAE9k0PkNjryhSiG53mjnKFhi%2BMipJMjSa%3DYkH-UeW3bfr1HPJQ%40mail.gmail.com With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Tue, Aug 23, 2016 at 8:54 AM, Amit Kapilawrote: > $SUBJECT will make hash indexes reliable and usable on standby. > AFAIU, currently hash indexes are not recommended to be used in > production mainly because they are not crash-safe and with this patch, > I hope we can address that limitation and recommend them for use in > production. > > This patch is built on my earlier patch [1] of making hash indexes > concurrent. The main reason for doing so is that the earlier patch > allows to complete the split operation and used light-weight locking > due to which operations can be logged at granular level. > > WAL for different operations: > > This has been explained in README as well, but I am again writing it > here for the ease of people. > > .. > One of the challenge in writing this patch was that the current code > was not written with a mindset that we need to write WAL for different > operations. Typical example is _hash_addovflpage() where pages are > modified across different function calls and all modifications needs > to be done atomically, so I have to refactor some code so that the > operations can be logged sensibly. > This patch has not done handling for OldSnapshot. Previously, we haven't done TestForOldSnapshot() checks in hash index as they were not logged, but now with this patch, it makes sense to perform such checks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers