Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 1:50 PM, Dilip Kumar wrote: > fixed Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 8:26 PM, Emre Hasegeli wrote: >> Hopefully, this time I got it correct. Since I am unable to reproduce >> the issue so I will again need your help in verifying the fix. > > It is not crashing with the new patch. Thank you. Thanks for verifying. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 8:42 PM, Robert Haas wrote: > Thanks for confirming. Some review comments on v2: > > +if (istate->pagetable) fixed > > Please compare explicitly to InvalidDsaPointer. > > +if (iterator->ptbase) > +ptbase = iterator->ptbase->ptentry; > +if (iterator->ptpages) > +idxpages = iterator->ptpages->index; > +if (iterator->ptchunks) > +idxchunks = iterator->ptchunks->index; > > Similarly. fixed Also fixed at + if (ptbase) + pg_atomic_init_u32(&ptbase->refcount, 0); > > Dilip, please also provide a proposed commit message describing what > this is fixing. Is it just the TBM_EMPTY case, or is there anything > else? Okay, I have added the commit message in the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_tbm_empty_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 10:56 AM, Emre Hasegeli wrote: >> Hopefully, this time I got it correct. Since I am unable to reproduce >> the issue so I will again need your help in verifying the fix. > > It is not crashing with the new patch. Thank you. Thanks for confirming. Some review comments on v2: +if (istate->pagetable) Please compare explicitly to InvalidDsaPointer. +if (iterator->ptbase) +ptbase = iterator->ptbase->ptentry; +if (iterator->ptpages) +idxpages = iterator->ptpages->index; +if (iterator->ptchunks) +idxchunks = iterator->ptchunks->index; Similarly. Dilip, please also provide a proposed commit message describing what this is fixing. Is it just the TBM_EMPTY case, or is there anything else? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> Hopefully, this time I got it correct. Since I am unable to reproduce > the issue so I will again need your help in verifying the fix. It is not crashing with the new patch. Thank you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 5:14 PM, Dilip Kumar wrote: > pg_atomic_write_u32_impl(val=0) at generic.h:57, queue = > 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) >>> * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate >>> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt] >>> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate >>> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt] >>> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate >>> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt] > > By looking at the call stack I got the problem location. I am > reviewing other parts of the code if there are the similar mistake at > other places. Soon I will post the patch. Thanks for the help. Based on the call stack I have tried to fix the issue. The problem is there was some uninitialized pointer access (in some special cases i.e. TBM_EMPTY when pagetable is not created at all). fix_tbm_empty.patch have fixed some of them but induced one which you are seeing in your call stack. Hopefully, this time I got it correct. Since I am unable to reproduce the issue so I will again need your help in verifying the fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_tbm_empty_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 3:52 PM, Emre Hasegeli wrote: >> * thread #1: tid = 0x51828fd, 0x000100caf314 >> postgres`tbm_prepare_shared_iterate [inlined] >> pg_atomic_write_u32_impl(val=0) at generic.h:57, queue = >> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) >> * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate >> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt] >> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate >> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt] >> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate >> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt] By looking at the call stack I got the problem location. I am reviewing other parts of the code if there are the similar mistake at other places. Soon I will post the patch. Thanks for the help. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> Are you getting the crash with the same test case? Yes. Here is the new backtrace: > * thread #1: tid = 0x51828fd, 0x000100caf314 > postgres`tbm_prepare_shared_iterate [inlined] pg_atomic_write_u32_impl(val=0) > at generic.h:57, queue = 'com.apple.main-thread', stop reason = > EXC_BAD_ACCESS (code=1, address=0x0) > * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate > [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt] > frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate > [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt] > frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate > [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt] > frame #3: 0x000100caf303 > postgres`tbm_prepare_shared_iterate(tbm=) + 723 at > tidbitmap.c:875 [opt] > frame #4: 0x000100c74844 postgres`BitmapHeapNext(node=) > + 436 at nodeBitmapHeapscan.c:154 [opt] > frame #5: 0x000100c615b0 > postgres`ExecProcNode(node=0x7fdabf8189f0) + 224 at execProcnode.c:459 > [opt] > frame #6: 0x000100c76ca9 postgres`ExecGather [inlined] > gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt] > frame #7: 0x000100c76aa1 postgres`ExecGather(node=) + > 497 at nodeGather.c:212 [opt] > frame #8: 0x000100c61692 > postgres`ExecProcNode(node=0x7fdabf818558) + 450 at execProcnode.c:541 > [opt] > frame #9: 0x000100c5cf70 postgres`standard_ExecutorRun [inlined] > ExecutePlan(estate=, planstate=, > use_parallel_mode=, operation=, numberTuples=0, > direction=, dest=) + 29 at execMain.c:1616 [opt] >frame #10: 0x000100c5cf53 > postgres`standard_ExecutorRun(queryDesc=, > direction=, count=0) + 291 at execMain.c:348 [opt] >frame #11: 0x000100dac0df > postgres`PortalRunSelect(portal=0x7fdac000b240, forward=, > count=0, dest=) + 255 at pquery.c:921 [opt] >frame #12: 0x000100dabc84 > postgres`PortalRun(portal=0x7fdac000b240, count=, > isTopLevel='\x01', dest=, altdest=, > completionTag=) + 500 at pquery.c:762 [opt] >frame #13: 0x000100da989b postgres`PostgresMain + 44 at > postgres.c:1101 [opt] >frame #14: 0x000100da986f postgres`PostgresMain(argc=, > argv=, dbname=, username=) + 8927 at > postgres.c:4066 [opt] >frame #15: 0x000100d2c113 postgres`PostmasterMain [inlined] BackendRun > + 7587 at postmaster.c:4317 [opt] >frame #16: 0x000100d2c0e8 postgres`PostmasterMain [inlined] > BackendStartup at postmaster.c:3989 [opt] >frame #17: 0x000100d2c0e8 postgres`PostmasterMain at postmaster.c:1729 > [opt] >frame #18: 0x000100d2c0e8 postgres`PostmasterMain(argc=, > argv=) + 7544 at postmaster.c:1337 [opt] >frame #19: 0x000100ca528f postgres`main(argc=, > argv=) + 1567 at main.c:228 [opt] >frame #20: 0x7fffb4e28255 libdyld.dylib`start + 1 >frame #21: 0x7fffb4e28255 libdyld.dylib`start + 1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 5:02 AM, Dilip Kumar wrote: > After above fix, I am not able to reproduce. Can you give me the > backtrace of the crash location or the dump? > > I am trying on the below commit > > commit c5832346625af4193b1242e57e7d13e66a220b38 > Author: Stephen Frost > Date: Wed Mar 15 11:19:39 2017 -0400 > > + > https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch > + fix_tbm_empty.patch Forgot to mention after fix I am seeing this output. postgres=# explain analyze select * from only r2 where i = 10; QUERY PLAN --- Gather (cost=2880.56..9251.98 rows=1 width=4) (actual time=3.857..3.857 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on r2 (cost=1880.56..8251.88 rows=1 width=4) (actual time=0.043..0.043 rows=0 loops=3) Recheck Cond: (i = 10) -> Bitmap Index Scan on r2_i_idx (cost=0.00..1880.56 rows=373694 width=0) (actual time=0.052..0.052 rows=0 loops=1) Index Cond: (i = 10) Planning time: 0.111 ms Execution time: 4.449 ms (9 rows) postgres=# select * from only r2 where i = 10; i --- (0 rows) Are you getting the crash with the same test case? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 16, 2017 at 12:56 AM, Emre Hasegeli wrote: >> Please verify the fix. > > The same test with both of the patches applied still crashes for me. After above fix, I am not able to reproduce. Can you give me the backtrace of the crash location or the dump? I am trying on the below commit commit c5832346625af4193b1242e57e7d13e66a220b38 Author: Stephen Frost Date: Wed Mar 15 11:19:39 2017 -0400 + https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch + fix_tbm_empty.patch -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> Please verify the fix. The same test with both of the patches applied still crashes for me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Wed, Mar 15, 2017 at 10:21 PM, Emre Hasegeli wrote: >> hasegeli=# create table r2 as select (random() * 3)::int as i from >> generate_series(1, 100); >> SELECT 100 >> hasegeli=# create index on r2 using brin (i); >> CREATE INDEX >> hasegeli=# analyze r2; >> ANALYZE >> hasegeli=# explain select * from only r2 where i = 10; >> QUERY PLAN >> - >> Gather (cost=2867.50..9225.32 rows=1 width=4) >>Workers Planned: 2 >>-> Parallel Bitmap Heap Scan on r2 (cost=1867.50..8225.22 rows=1 >> width=4) >> Recheck Cond: (i = 10) >> -> Bitmap Index Scan on r2_i_idx (cost=0.00..1867.50 rows=371082 >> width=0) >>Index Cond: (i = 10) >> (6 rows) >> >> hasegeli=# select * from only r2 where i = 10; I am able to reproduce the bug, and attached patch fixes the same. Problem is that I am not handling TBM_EMPTY state properly. I remember that while reviewing the patch Robert mentioned that we might need to handle the TBM_EMPTY and I told that since we are not handling in non-parallel mode so we don't need to handle here as well. But, I was wrong. So the problem is that if state is not TBM_HASH then it's directly assuming TBM_ONE_PAGE which is completely wrong. I have fixed that and also fixed in other similar locations. Please verify the fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_tbm_empty.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> With my test case, I could not crash even with this patch applied. > Can you provide your test case? Yes: > hasegeli=# create table r2 as select (random() * 3)::int as i from > generate_series(1, 100); > SELECT 100 > hasegeli=# create index on r2 using brin (i); > CREATE INDEX > hasegeli=# analyze r2; > ANALYZE > hasegeli=# explain select * from only r2 where i = 10; > QUERY PLAN > - > Gather (cost=2867.50..9225.32 rows=1 width=4) >Workers Planned: 2 >-> Parallel Bitmap Heap Scan on r2 (cost=1867.50..8225.22 rows=1 width=4) > Recheck Cond: (i = 10) > -> Bitmap Index Scan on r2_i_idx (cost=0.00..1867.50 rows=371082 > width=0) >Index Cond: (i = 10) > (6 rows) > > hasegeli=# select * from only r2 where i = 10; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Wed, Mar 15, 2017 at 10:02 PM, Emre Hasegeli wrote: > I was testing with the brin correlation patch [1] applied. I cannot > crash it without the patch either. I am sorry for not testing it > before. The patch make BRIN selectivity estimation function access > more information. > > [1] > https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch With my test case, I could not crash even with this patch applied. Can you provide your test case? (table, index, data, query) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> This can crash at line:414, if either tuple is invalid memory(but I > think it's not because we have already accessed this memory in above > if check) or dtup is invalid (this is also not possible because > brin_new_memtuple has already accessed this). I was testing with the brin correlation patch [1] applied. I cannot crash it without the patch either. I am sorry for not testing it before. The patch make BRIN selectivity estimation function access more information. [1] https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Wed, Mar 15, 2017 at 8:11 PM, Emre Hasegeli wrote: >> * thread #1: tid = 0x5045a8f, 0x00010ae44558 >> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, >> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = >> 'com.apple.main-thread', stop reason = signal SIGUSR1 >> * frame #0: 0x00010ae44558 >> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, >> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt] >>frame #1: 0x00010ae4000c >> postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at >> brin.c:398 [opt] >>frame #2: 0x00010ae9b451 >> postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 >> at indexam.c:726 [opt] >>frame #3: 0x00010b0035a9 >> postgres`MultiExecBitmapIndexScan(node=) + 233 at >> nodeBitmapIndexscan.c:91 [opt] >>frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) >> + 400 at nodeBitmapHeapscan.c:143 [opt] Further analyzing the call stack, seems like this is not exact call stack where it crashed. Because, if you notice the code in the brin_deform_tuple (line 414) brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple) { dtup = brin_new_memtuple(brdesc); if (BrinTupleIsPlaceholder(tuple)) dtup->bt_placeholder = true; dtup->bt_blkno = tuple->bt_blkno; --> line 414 This can crash at line:414, if either tuple is invalid memory(but I think it's not because we have already accessed this memory in above if check) or dtup is invalid (this is also not possible because brin_new_memtuple has already accessed this). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Wed, Mar 15, 2017 at 8:51 PM, Dilip Kumar wrote: >> I can try to provide a test case, if that wouldn't be enough to spot >> the problem. > > Thanks for reporting, I am looking into this. Meanwhile, if you can > provide the reproducible test case then locating the issue will be > faster. After trying multiple attempts with different datasets I am unable to reproduce the issue. I tried with below test case: create table t(a int, b varchar); insert into t values(generate_series(1,1000), repeat('x', 100)); insert into t values(generate_series(1,1), repeat('x', 100)); create index idx on t using brin(a); postgres=# analyze t; ANALYZE postgres=# explain analyze select * from t where a>6; QUERY PLAN -- Gather (cost=580794.52..3059826.52 rows=110414922 width=105) (actual time=92.324..91853.716 rows=110425971 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Bitmap Heap Scan on t (cost=579794.52..3058826.52 rows=46006218 width=105) (actual time=65.651..62023.020 rows=36808657 loops=3) Recheck Cond: (a > 6) Rows Removed by Index Recheck: 4 Heap Blocks: lossy=204401 -> Bitmap Index Scan on idx (cost=0.00..552190.79 rows=110425920 width=0) (actual time=88.215..88.215 rows=1904 loops=1) Index Cond: (a > 6) Planning time: 1.116 ms Execution time: 96176.881 ms (11 rows) Is it possible for you to provide a reproducible test case? I also applied the patch given up thread[1] but still could not reproduce. [1] https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Wed, Mar 15, 2017 at 8:11 PM, Emre Hasegeli wrote: > > I can try to provide a test case, if that wouldn't be enough to spot > the problem. Thanks for reporting, I am looking into this. Meanwhile, if you can provide the reproducible test case then locating the issue will be faster. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
> I don't know if this is the only problem > I'll be in this general area today, so will mention if I stumble over > anything that looks broken. I was testing the same patch with a large dataset and got a different segfault: > hasegeli=# explain select * from only mp_notification_20170225 where > server_id = 7; >QUERY PLAN > -- > Gather (cost=26682.94..476995.88 rows=1 width=215) > Workers Planned: 2 > -> Parallel Bitmap Heap Scan on mp_notification_20170225 > (cost=25682.94..475995.78 rows=1 width=215) > Recheck Cond: (server_id = 7) > -> Bitmap Index Scan on mp_notification_block_idx > (cost=0.00..25682.94 rows=4557665 width=0) > Index Cond: (server_id = 7) > (6 rows) > > hasegeli=# select * from only mp_notification_20170225 where server_id = 7; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > * thread #1: tid = 0x5045a8f, 0x00010ae44558 > postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, > tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = > 'com.apple.main-thread', stop reason = signal SIGUSR1 > * frame #0: 0x00010ae44558 > postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, > tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt] >frame #1: 0x00010ae4000c > postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at > brin.c:398 [opt] >frame #2: 0x00010ae9b451 > postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 > at indexam.c:726 [opt] >frame #3: 0x00010b0035a9 > postgres`MultiExecBitmapIndexScan(node=) + 233 at > nodeBitmapIndexscan.c:91 [opt] >frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) + > 400 at nodeBitmapHeapscan.c:143 [opt] >frame #5: 0x00010afef5d0 > postgres`ExecProcNode(node=0x7fea3c873948) + 224 at execProcnode.c:459 > [opt] >frame #6: 0x00010b004cc9 postgres`ExecGather [inlined] > gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt] >frame #7: 0x00010b004ac1 postgres`ExecGather(node=) + 497 > at nodeGather.c:212 [opt] >frame #8: 0x00010afef6b2 > postgres`ExecProcNode(node=0x7fea3c872f58) + 450 at execProcnode.c:541 > [opt] >frame #9: 0x00010afeaf90 postgres`standard_ExecutorRun [inlined] > ExecutePlan(estate=, planstate=, > use_parallel_mode=, operation=, numberTuples=0, > direction=, dest=) + 29 at execMain.c:1616 [opt] >frame #10: 0x00010afeaf73 > postgres`standard_ExecutorRun(queryDesc=, > direction=, count=0) + 291 at execMain.c:348 [opt] >frame #11: 0x00010af8b108 > postgres`ExplainOnePlan(plannedstmt=0x7fea3c871040, > into=0x, es=0x7fea3c805360, queryString=, > params=, planduration=) + 328 at explain.c:533 [opt] >frame #12: 0x00010af8ab98 > postgres`ExplainOneQuery(query=0x7fea3c805890, > cursorOptions=, into=0x, es=0x7fea3c805360, > queryString=,params=0x) + 280 at explain.c:369 > [opt] >frame #13: 0x00010af8a773 postgres`ExplainQuery(pstate=, > stmt=0x7fea3d005450, queryString="explain analyze select * from only > mp_notification_20170225 where server_id > 6;",params=0x, > dest=0x7fea3c8052c8) + 819 at explain.c:254 [opt] >frame #14: 0x00010b13b660 > postgres`standard_ProcessUtility(pstmt=0x7fea3d005fa8, > queryString="explain analyze select * from only mp_notification_20170225 > where server_id > 6;",context=PROCESS_UTILITY_TOPLEVEL, > params=0x, dest=0x7fea3c8052c8, > completionTag=) + 1104 at utility.c:675 [opt] >frame #15: 0x00010b13ad2a > postgres`PortalRunUtility(portal=0x7fea3c837640, > pstmt=0x7fea3d005fa8, isTopLevel='\x01', setHoldSnapshot=, > dest=0x7fea3c8052c8, completionTag=) + 90 at pquery.c:1165 > [opt] >frame #16: 0x00010b139f56 > postgres`FillPortalStore(portal=0x7fea3c837640, isTopLevel='\x01') + 182 > at pquery.c:1025 [opt] >frame #17: 0x00010b139c22 > postgres`PortalRun(portal=0x7fea3c837640, count=, > isTopLevel='\x01', dest=, altdest=, > completionTag=) + 402 at pquery.c:757 [opt] >frame #18: 0x00010b13789b postgres`PostgresMain + 44 at > postgres.c:1101 [opt] >frame #19: 0x00010b13786f postgres`PostgresMain(argc=, > argv=, dbname=, username=) + 8927 at > postgres.c:4066 [opt] >frame #20: 0x00010b0ba113 postgres`PostmasterMain [inlined] BackendRun > + 7587 at postmaster.c:4317 [opt] >frame #21: 0x00010b0ba0e8 postgres`PostmasterMain [inlined] > BackendStartup at postmaster.c:3989 [opt] >frame #22: 0x00010b0ba0e8 postgres`PostmasterMain at postmaster.c:1729 > [opt] >frame #23: 0x
Re: [HACKERS] Parallel Bitmap scans a bit broken
On 10 March 2017 at 06:17, Robert Haas wrote: > On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar > wrote: > > On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar > wrote: > >> I slightly modified your query to reproduce this issue. > >> > >> explain analyze select * from r1 where value<555; > >> > >> Patch is attached to fix the problem. > > > > I forgot to mention the cause of the problem. > > > > if (istate->schunkptr < istate->nchunks) > > { > >PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]]; > > PagetableEntry *page = &ptbase[idxpages[istate->spageptr]]; > > BlockNumber chunk_blockno; > > > > In above if condition we have only checked istate->schunkptr < > > istate->nchunks that means we have some chunk left so we are safe to > > access idxchunks, But just after that we are accessing > > ptbase[idxpages[istate->spageptr]] without checking that accessing > > idxpages is safe or not. > > > > tbm_iterator already handling this case, I broke it in > tbm_shared_iterator. > > I don't know if this is the only problem -- it would be good if David > could retest -- but it's certainly *a* problem, so committed. > Thanks for committing, and generally parallelising more stuff. I confirm that my test case is now working again. I'll be in this general area today, so will mention if I stumble over anything that looks broken. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar wrote: > On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar wrote: >> I slightly modified your query to reproduce this issue. >> >> explain analyze select * from r1 where value<555; >> >> Patch is attached to fix the problem. > > I forgot to mention the cause of the problem. > > if (istate->schunkptr < istate->nchunks) > { >PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]]; > PagetableEntry *page = &ptbase[idxpages[istate->spageptr]]; > BlockNumber chunk_blockno; > > In above if condition we have only checked istate->schunkptr < > istate->nchunks that means we have some chunk left so we are safe to > access idxchunks, But just after that we are accessing > ptbase[idxpages[istate->spageptr]] without checking that accessing > idxpages is safe or not. > > tbm_iterator already handling this case, I broke it in tbm_shared_iterator. I don't know if this is the only problem -- it would be good if David could retest -- but it's certainly *a* problem, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar wrote: > I slightly modified your query to reproduce this issue. > > explain analyze select * from r1 where value<555; > > Patch is attached to fix the problem. I forgot to mention the cause of the problem. if (istate->schunkptr < istate->nchunks) { PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]]; PagetableEntry *page = &ptbase[idxpages[istate->spageptr]]; BlockNumber chunk_blockno; In above if condition we have only checked istate->schunkptr < istate->nchunks that means we have some chunk left so we are safe to access idxchunks, But just after that we are accessing ptbase[idxpages[istate->spageptr]] without checking that accessing idxpages is safe or not. tbm_iterator already handling this case, I broke it in tbm_shared_iterator. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 9, 2017 at 9:37 PM, Dilip Kumar wrote: >> =# create table r1(value int); >> CREATE TABLE >> =# insert into r1 select (random()*1000)::int from >> generate_Series(1,100); >> INSERT 0 100 >> =# create index on r1 using brin(value); >> CREATE INDEX >> =# set enable_seqscan=0; >> SET >> =# explain select * from r1 where value=555; > > I am looking into the issue, I have already reproduced it. I will > update on this soon. > > Thanks for reporting. I slightly modified your query to reproduce this issue. explain analyze select * from r1 where value<555; Patch is attached to fix the problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_bitmap_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Bitmap scans a bit broken
On Thu, Mar 9, 2017 at 9:17 PM, David Rowley wrote: > patch with [1] > > =# create table r1(value int); > CREATE TABLE > =# insert into r1 select (random()*1000)::int from > generate_Series(1,100); > INSERT 0 100 > =# create index on r1 using brin(value); > CREATE INDEX > =# set enable_seqscan=0; > SET > =# explain select * from r1 where value=555; I am looking into the issue, I have already reproduced it. I will update on this soon. Thanks for reporting. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers