subject:"Re\: \[HACKERS\] Parallel Bitmap scans a bit broken"

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Robert Haas

On Thu, Mar 16, 2017 at 1:50 PM, Dilip Kumar  wrote:
> fixed

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 8:26 PM, Emre Hasegeli  wrote:
>> Hopefully, this time I got it correct.  Since I am unable to reproduce
>> the issue so I will again need your help in verifying the fix.
>
> It is not crashing with the new patch.  Thank you.

Thanks for verifying.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 8:42 PM, Robert Haas  wrote:
> Thanks for confirming.  Some review comments on v2:
>
> +if (istate->pagetable)
fixed
>
> Please compare explicitly to InvalidDsaPointer.
>
> +if (iterator->ptbase)
> +ptbase = iterator->ptbase->ptentry;
> +if (iterator->ptpages)
> +idxpages = iterator->ptpages->index;
> +if (iterator->ptchunks)
> +idxchunks = iterator->ptchunks->index;
>
> Similarly.
fixed

Also fixed at
+ if (ptbase)
+   pg_atomic_init_u32(&ptbase->refcount, 0);

>
> Dilip, please also provide a proposed commit message describing what
> this is fixing.  Is it just the TBM_EMPTY case, or is there anything
> else?

Okay, I have added the commit message in the patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


fix_tbm_empty_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Robert Haas

On Thu, Mar 16, 2017 at 10:56 AM, Emre Hasegeli  wrote:
>> Hopefully, this time I got it correct.  Since I am unable to reproduce
>> the issue so I will again need your help in verifying the fix.
>
> It is not crashing with the new patch.  Thank you.

Thanks for confirming.  Some review comments on v2:

+if (istate->pagetable)

Please compare explicitly to InvalidDsaPointer.

+if (iterator->ptbase)
+ptbase = iterator->ptbase->ptentry;
+if (iterator->ptpages)
+idxpages = iterator->ptpages->index;
+if (iterator->ptchunks)
+idxchunks = iterator->ptchunks->index;

Similarly.

Dilip, please also provide a proposed commit message describing what
this is fixing.  Is it just the TBM_EMPTY case, or is there anything
else?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Emre Hasegeli

> Hopefully, this time I got it correct.  Since I am unable to reproduce
> the issue so I will again need your help in verifying the fix.

It is not crashing with the new patch.  Thank you.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 5:14 PM, Dilip Kumar  wrote:
> pg_atomic_write_u32_impl(val=0) at generic.h:57, queue = 
> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>>>   * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>>> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt]
>>> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>>> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt]
>>> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>>> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt]
>
> By looking at the call stack I got the problem location.  I am
> reviewing other parts of the code if there are the similar mistake at
> other places. Soon I will post the patch.  Thanks for the help.

Based on the call stack I have tried to fix the issue. The problem is
there was some uninitialized pointer access (in some special cases
i.e. TBM_EMPTY when pagetable is not created at all).

 fix_tbm_empty.patch have fixed some of them but induced one which you
are seeing in your call stack.

Hopefully, this time I got it correct.  Since I am unable to reproduce
the issue so I will again need your help in verifying the fix.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


fix_tbm_empty_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 3:52 PM, Emre Hasegeli  wrote:
>> * thread #1: tid = 0x51828fd, 0x000100caf314 
>> postgres`tbm_prepare_shared_iterate [inlined] 
>> pg_atomic_write_u32_impl(val=0) at generic.h:57, queue = 
>> 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>>   * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt]
>> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt]
>> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
>> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt]

By looking at the call stack I got the problem location.  I am
reviewing other parts of the code if there are the similar mistake at
other places. Soon I will post the patch.  Thanks for the help.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Emre Hasegeli

> Are you getting the crash with the same test case?

Yes.  Here is the new backtrace:

> * thread #1: tid = 0x51828fd, 0x000100caf314 
> postgres`tbm_prepare_shared_iterate [inlined] pg_atomic_write_u32_impl(val=0) 
> at generic.h:57, queue = 'com.apple.main-thread', stop reason = 
> EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_write_u32_impl(val=0) at generic.h:57 [opt]
> frame #1: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_init_u32_impl(val_=0) at generic.h:163 [opt]
> frame #2: 0x000100caf314 postgres`tbm_prepare_shared_iterate 
> [inlined] pg_atomic_init_u32(val=0) + 17 at atomics.h:237 [opt]
> frame #3: 0x000100caf303 
> postgres`tbm_prepare_shared_iterate(tbm=) + 723 at 
> tidbitmap.c:875 [opt]
> frame #4: 0x000100c74844 postgres`BitmapHeapNext(node=) 
> + 436 at nodeBitmapHeapscan.c:154 [opt]
> frame #5: 0x000100c615b0 
> postgres`ExecProcNode(node=0x7fdabf8189f0) + 224 at execProcnode.c:459 
> [opt]
> frame #6: 0x000100c76ca9 postgres`ExecGather [inlined] 
> gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt]
> frame #7: 0x000100c76aa1 postgres`ExecGather(node=) + 
> 497 at nodeGather.c:212 [opt]
> frame #8: 0x000100c61692 
> postgres`ExecProcNode(node=0x7fdabf818558) + 450 at execProcnode.c:541 
> [opt]
> frame #9: 0x000100c5cf70 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=, 
> use_parallel_mode=, operation=, numberTuples=0, 
> direction=, dest=) + 29 at execMain.c:1616 [opt]
>frame #10: 0x000100c5cf53 
> postgres`standard_ExecutorRun(queryDesc=, 
> direction=, count=0) + 291 at execMain.c:348 [opt]
>frame #11: 0x000100dac0df 
> postgres`PortalRunSelect(portal=0x7fdac000b240, forward=, 
> count=0, dest=) + 255 at pquery.c:921 [opt]
>frame #12: 0x000100dabc84 
> postgres`PortalRun(portal=0x7fdac000b240, count=, 
> isTopLevel='\x01', dest=, altdest=, 
> completionTag=) + 500 at pquery.c:762 [opt]
>frame #13: 0x000100da989b postgres`PostgresMain + 44 at 
> postgres.c:1101 [opt]
>frame #14: 0x000100da986f postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 8927 at 
> postgres.c:4066 [opt]
>frame #15: 0x000100d2c113 postgres`PostmasterMain [inlined] BackendRun 
> + 7587 at postmaster.c:4317 [opt]
>frame #16: 0x000100d2c0e8 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3989 [opt]
>frame #17: 0x000100d2c0e8 postgres`PostmasterMain at postmaster.c:1729 
> [opt]
>frame #18: 0x000100d2c0e8 postgres`PostmasterMain(argc=, 
> argv=) + 7544 at postmaster.c:1337 [opt]
>frame #19: 0x000100ca528f postgres`main(argc=, 
> argv=) + 1567 at main.c:228 [opt]
>frame #20: 0x7fffb4e28255 libdyld.dylib`start + 1
>frame #21: 0x7fffb4e28255 libdyld.dylib`start + 1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 5:02 AM, Dilip Kumar  wrote:
> After above fix, I am not able to reproduce. Can you give me the
> backtrace of the crash location or the dump?
>
> I am trying on the below commit
>
> commit c5832346625af4193b1242e57e7d13e66a220b38
> Author: Stephen Frost 
> Date:   Wed Mar 15 11:19:39 2017 -0400
>
> + 
> https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch
> + fix_tbm_empty.patch

Forgot to mention after fix I am seeing this output.

postgres=# explain analyze select * from only r2 where i = 10;
  QUERY PLAN
---
 Gather  (cost=2880.56..9251.98 rows=1 width=4) (actual
time=3.857..3.857 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on r2  (cost=1880.56..8251.88 rows=1
width=4) (actual time=0.043..0.043 rows=0 loops=3)
 Recheck Cond: (i = 10)
 ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1880.56
rows=373694 width=0) (actual time=0.052..0.052 rows=0 loops=1)
   Index Cond: (i = 10)
 Planning time: 0.111 ms
 Execution time: 4.449 ms
(9 rows)

postgres=# select * from only r2 where i = 10;
 i
---
(0 rows)

Are you getting the crash with the same test case?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Thu, Mar 16, 2017 at 12:56 AM, Emre Hasegeli  wrote:
>> Please verify the fix.
>
> The same test with both of the patches applied still crashes for me.
After above fix, I am not able to reproduce. Can you give me the
backtrace of the crash location or the dump?

I am trying on the below commit

commit c5832346625af4193b1242e57e7d13e66a220b38
Author: Stephen Frost 
Date:   Wed Mar 15 11:19:39 2017 -0400

+ 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch
+ fix_tbm_empty.patch


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> Please verify the fix.

The same test with both of the patches applied still crashes for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 10:21 PM, Emre Hasegeli  wrote:
>> hasegeli=# create table r2 as select (random() * 3)::int as i from 
>> generate_series(1, 100);
>> SELECT 100
>> hasegeli=# create index on r2 using brin (i);
>> CREATE INDEX
>> hasegeli=# analyze r2;
>> ANALYZE
>> hasegeli=# explain select * from only r2 where i = 10;
>>  QUERY PLAN
>> -
>>  Gather  (cost=2867.50..9225.32 rows=1 width=4)
>>Workers Planned: 2
>>->  Parallel Bitmap Heap Scan on r2  (cost=1867.50..8225.22 rows=1 
>> width=4)
>>  Recheck Cond: (i = 10)
>>  ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1867.50 rows=371082 
>> width=0)
>>Index Cond: (i = 10)
>> (6 rows)
>>
>> hasegeli=# select * from only r2 where i = 10;

I am able to reproduce the bug, and attached patch fixes the same.
Problem is that I am not handling TBM_EMPTY state properly.  I
remember that while reviewing the patch Robert mentioned that we might
need to handle the TBM_EMPTY and I told that since we are not handling
in non-parallel mode so we don't need to handle here as well.  But, I
was wrong.  So the problem is that if state is not TBM_HASH then it's
directly assuming TBM_ONE_PAGE which is completely wrong.  I have
fixed that and also fixed in other similar locations.

Please verify the fix.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

fix_tbm_empty.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> With my test case, I could not crash even with this patch applied.
> Can you provide your test case?

Yes:

> hasegeli=# create table r2 as select (random() * 3)::int as i from 
> generate_series(1, 100);
> SELECT 100
> hasegeli=# create index on r2 using brin (i);
> CREATE INDEX
> hasegeli=# analyze r2;
> ANALYZE
> hasegeli=# explain select * from only r2 where i = 10;
>  QUERY PLAN
> -
>  Gather  (cost=2867.50..9225.32 rows=1 width=4)
>Workers Planned: 2
>->  Parallel Bitmap Heap Scan on r2  (cost=1867.50..8225.22 rows=1 width=4)
>  Recheck Cond: (i = 10)
>  ->  Bitmap Index Scan on r2_i_idx  (cost=0.00..1867.50 rows=371082 
> width=0)
>Index Cond: (i = 10)
> (6 rows)
>
> hasegeli=# select * from only r2 where i = 10;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 10:02 PM, Emre Hasegeli  wrote:
> I was testing with the brin correlation patch [1] applied.  I cannot
> crash it without the patch either.  I am sorry for not testing it
> before.  The patch make BRIN selectivity estimation function access
> more information.
>
> [1] 
> https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

With my test case, I could not crash even with this patch applied.
Can you provide your test case?
(table, index, data, query)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> This can crash at line:414, if either tuple is invalid memory(but I
> think it's not because we have already accessed this memory in above
> if check) or dtup is invalid (this is also not possible because
> brin_new_memtuple has already accessed this).

I was testing with the brin correlation patch [1] applied.  I cannot
crash it without the patch either.  I am sorry for not testing it
before.  The patch make BRIN selectivity estimation function access
more information.

[1] 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 8:11 PM, Emre Hasegeli  wrote:
>> * thread #1: tid = 0x5045a8f, 0x00010ae44558 
>> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
>> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = 
>> 'com.apple.main-thread', stop reason = signal SIGUSR1
>>  * frame #0: 0x00010ae44558 
>> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
>> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt]
>>frame #1: 0x00010ae4000c 
>> postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at 
>> brin.c:398 [opt]
>>frame #2: 0x00010ae9b451 
>> postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 
>> at indexam.c:726 [opt]
>>frame #3: 0x00010b0035a9 
>> postgres`MultiExecBitmapIndexScan(node=) + 233 at 
>> nodeBitmapIndexscan.c:91 [opt]
>>frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) 
>> + 400 at nodeBitmapHeapscan.c:143 [opt]

Further analyzing the call stack, seems like this is not exact call
stack where it crashed.  Because, if you notice the code in the
brin_deform_tuple (line 414)

brin_deform_tuple(BrinDesc *brdesc, BrinTuple *tuple)

{
  dtup = brin_new_memtuple(brdesc);

 if (BrinTupleIsPlaceholder(tuple))
 dtup->bt_placeholder = true;
 dtup->bt_blkno = tuple->bt_blkno;  --> line 414

This can crash at line:414, if either tuple is invalid memory(but I
think it's not because we have already accessed this memory in above
if check) or dtup is invalid (this is also not possible because
brin_new_memtuple has already accessed this).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 8:51 PM, Dilip Kumar  wrote:
>> I can try to provide a test case, if that wouldn't be enough to spot
>> the problem.
>
> Thanks for reporting, I am looking into this.  Meanwhile, if you can
> provide the reproducible test case then locating the issue will be
> faster.

After trying multiple attempts with different datasets I am unable to
reproduce the issue.

I tried with below test case:
create table t(a int, b varchar);
 insert into t values(generate_series(1,1000), repeat('x', 100));
 insert into t values(generate_series(1,1), repeat('x', 100));
create index idx on t using brin(a);
postgres=# analyze t;
ANALYZE
postgres=# explain analyze select * from t where a>6;

QUERY PLAN
--
 Gather  (cost=580794.52..3059826.52 rows=110414922 width=105) (actual
time=92.324..91853.716 rows=110425971 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on t  (cost=579794.52..3058826.52
rows=46006218 width=105) (actual time=65.651..62023.020 rows=36808657
loops=3)
 Recheck Cond: (a > 6)
 Rows Removed by Index Recheck: 4
 Heap Blocks: lossy=204401
 ->  Bitmap Index Scan on idx  (cost=0.00..552190.79
rows=110425920 width=0) (actual time=88.215..88.215 rows=1904
loops=1)
   Index Cond: (a > 6)
 Planning time: 1.116 ms
 Execution time: 96176.881 ms
(11 rows)

Is it possible for you to provide a reproducible test case?  I also
applied the patch given up thread[1] but still could not reproduce.


[1] 
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Dilip Kumar

On Wed, Mar 15, 2017 at 8:11 PM, Emre Hasegeli  wrote:
>
> I can try to provide a test case, if that wouldn't be enough to spot
> the problem.

Thanks for reporting, I am looking into this.  Meanwhile, if you can
provide the reproducible test case then locating the issue will be
faster.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-15 Thread Emre Hasegeli

> I don't know if this is the only problem

> I'll be in this general area today, so will mention if I stumble over
> anything that looks broken.

I was testing the same patch with a large dataset and got a different segfault:

> hasegeli=# explain select * from only mp_notification_20170225 where 
> server_id = 7;
>QUERY PLAN
> --
> Gather  (cost=26682.94..476995.88 rows=1 width=215)
>   Workers Planned: 2
>   ->  Parallel Bitmap Heap Scan on mp_notification_20170225  
> (cost=25682.94..475995.78 rows=1 width=215)
> Recheck Cond: (server_id = 7)
> ->  Bitmap Index Scan on mp_notification_block_idx  
> (cost=0.00..25682.94 rows=4557665 width=0)
>   Index Cond: (server_id = 7)
> (6 rows)
>
> hasegeli=# select * from only mp_notification_20170225 where server_id = 7;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

> * thread #1: tid = 0x5045a8f, 0x00010ae44558 
> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414, queue = 
> 'com.apple.main-thread', stop reason = signal SIGUSR1
>  * frame #0: 0x00010ae44558 
> postgres`brin_deform_tuple(brdesc=0x7fea3c86a3a8, 
> tuple=0x7fea3c891040) + 40 at brin_tuple.c:414 [opt]
>frame #1: 0x00010ae4000c 
> postgres`bringetbitmap(scan=0x7fea3c875c20, tbm=) + 428 at 
> brin.c:398 [opt]
>frame #2: 0x00010ae9b451 
> postgres`index_getbitmap(scan=0x7fea3c875c20, bitmap=) + 65 
> at indexam.c:726 [opt]
>frame #3: 0x00010b0035a9 
> postgres`MultiExecBitmapIndexScan(node=) + 233 at 
> nodeBitmapIndexscan.c:91 [opt]
>frame #4: 0x00010b002840 postgres`BitmapHeapNext(node=) + 
> 400 at nodeBitmapHeapscan.c:143 [opt]
>frame #5: 0x00010afef5d0 
> postgres`ExecProcNode(node=0x7fea3c873948) + 224 at execProcnode.c:459 
> [opt]
>frame #6: 0x00010b004cc9 postgres`ExecGather [inlined] 
> gather_getnext(gatherstate=) + 520 at nodeGather.c:276 [opt]
>frame #7: 0x00010b004ac1 postgres`ExecGather(node=) + 497 
> at nodeGather.c:212 [opt]
>frame #8: 0x00010afef6b2 
> postgres`ExecProcNode(node=0x7fea3c872f58) + 450 at execProcnode.c:541 
> [opt]
>frame #9: 0x00010afeaf90 postgres`standard_ExecutorRun [inlined] 
> ExecutePlan(estate=, planstate=, 
> use_parallel_mode=, operation=, numberTuples=0, 
> direction=, dest=) + 29 at execMain.c:1616 [opt]
>frame #10: 0x00010afeaf73 
> postgres`standard_ExecutorRun(queryDesc=, 
> direction=, count=0) + 291 at execMain.c:348 [opt]
>frame #11: 0x00010af8b108 
> postgres`ExplainOnePlan(plannedstmt=0x7fea3c871040, 
> into=0x, es=0x7fea3c805360, queryString=, 
> params=, planduration=) + 328 at explain.c:533 [opt]
>frame #12: 0x00010af8ab98 
> postgres`ExplainOneQuery(query=0x7fea3c805890, 
> cursorOptions=, into=0x, es=0x7fea3c805360, 
> queryString=,params=0x) + 280 at explain.c:369 
> [opt]
>frame #13: 0x00010af8a773 postgres`ExplainQuery(pstate=, 
> stmt=0x7fea3d005450, queryString="explain analyze select * from only 
> mp_notification_20170225 where server_id > 6;",params=0x, 
> dest=0x7fea3c8052c8) + 819 at explain.c:254 [opt]
>frame #14: 0x00010b13b660 
> postgres`standard_ProcessUtility(pstmt=0x7fea3d005fa8, 
> queryString="explain analyze select * from only mp_notification_20170225 
> where server_id > 6;",context=PROCESS_UTILITY_TOPLEVEL, 
> params=0x, dest=0x7fea3c8052c8, 
> completionTag=) + 1104 at utility.c:675 [opt]
>frame #15: 0x00010b13ad2a 
> postgres`PortalRunUtility(portal=0x7fea3c837640, 
> pstmt=0x7fea3d005fa8, isTopLevel='\x01', setHoldSnapshot=, 
> dest=0x7fea3c8052c8, completionTag=) + 90 at pquery.c:1165 
> [opt]
>frame #16: 0x00010b139f56 
> postgres`FillPortalStore(portal=0x7fea3c837640, isTopLevel='\x01') + 182 
> at pquery.c:1025 [opt]
>frame #17: 0x00010b139c22 
> postgres`PortalRun(portal=0x7fea3c837640, count=, 
> isTopLevel='\x01', dest=, altdest=, 
> completionTag=) + 402 at pquery.c:757 [opt]
>frame #18: 0x00010b13789b postgres`PostgresMain + 44 at 
> postgres.c:1101 [opt]
>frame #19: 0x00010b13786f postgres`PostgresMain(argc=, 
> argv=, dbname=, username=) + 8927 at 
> postgres.c:4066 [opt]
>frame #20: 0x00010b0ba113 postgres`PostmasterMain [inlined] BackendRun 
> + 7587 at postmaster.c:4317 [opt]
>frame #21: 0x00010b0ba0e8 postgres`PostmasterMain [inlined] 
> BackendStartup at postmaster.c:3989 [opt]
>frame #22: 0x00010b0ba0e8 postgres`PostmasterMain at postmaster.c:1729 
> [opt]
>frame #23: 0x

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread David Rowley

On 10 March 2017 at 06:17, Robert Haas  wrote:

> On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar 
> wrote:
> > On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar 
> wrote:
> >> I slightly modified your query to reproduce this issue.
> >>
> >> explain analyze select * from r1 where value<555;
> >>
> >> Patch is attached to fix the problem.
> >
> > I forgot to mention the cause of the problem.
> >
> > if (istate->schunkptr < istate->nchunks)
> > {
> >PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
> > PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
> > BlockNumber chunk_blockno;
> >
> > In above if condition we have only checked istate->schunkptr <
> > istate->nchunks that means we have some chunk left so we are safe to
> > access idxchunks,  But just after that we are accessing
> > ptbase[idxpages[istate->spageptr]] without checking that accessing
> > idxpages is safe or not.
> >
> > tbm_iterator already handling this case, I broke it in
> tbm_shared_iterator.
>
> I don't know if this is the only problem -- it would be good if David
> could retest -- but it's certainly *a* problem, so committed.
>

Thanks for committing, and generally parallelising more stuff.

I confirm that my test case is now working again.

I'll be in this general area today, so will mention if I stumble over
anything that looks broken.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar  wrote:
> On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
>> I slightly modified your query to reproduce this issue.
>>
>> explain analyze select * from r1 where value<555;
>>
>> Patch is attached to fix the problem.
>
> I forgot to mention the cause of the problem.
>
> if (istate->schunkptr < istate->nchunks)
> {
>PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
> PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
> BlockNumber chunk_blockno;
>
> In above if condition we have only checked istate->schunkptr <
> istate->nchunks that means we have some chunk left so we are safe to
> access idxchunks,  But just after that we are accessing
> ptbase[idxpages[istate->spageptr]] without checking that accessing
> idxpages is safe or not.
>
> tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

I don't know if this is the only problem -- it would be good if David
could retest -- but it's certainly *a* problem, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
> I slightly modified your query to reproduce this issue.
>
> explain analyze select * from r1 where value<555;
>
> Patch is attached to fix the problem.

I forgot to mention the cause of the problem.

if (istate->schunkptr < istate->nchunks)
{
   PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

In above if condition we have only checked istate->schunkptr <
istate->nchunks that means we have some chunk left so we are safe to
access idxchunks,  But just after that we are accessing
ptbase[idxpages[istate->spageptr]] without checking that accessing
idxpages is safe or not.

tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 9:37 PM, Dilip Kumar  wrote:
>> =# create table r1(value int);
>> CREATE TABLE
>> =# insert into r1 select (random()*1000)::int from
>> generate_Series(1,100);
>> INSERT 0 100
>> =# create index on r1 using brin(value);
>> CREATE INDEX
>> =# set enable_seqscan=0;
>> SET
>> =# explain select * from r1 where value=555;
>
> I am looking into the issue, I have already reproduced it.  I will
> update on this soon.
>
> Thanks for reporting.

I slightly modified your query to reproduce this issue.

explain analyze select * from r1 where value<555;

Patch is attached to fix the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


parallel_bitmap_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 9:17 PM, David Rowley
 wrote:
> patch with [1]
>
> =# create table r1(value int);
> CREATE TABLE
> =# insert into r1 select (random()*1000)::int from
> generate_Series(1,100);
> INSERT 0 100
> =# create index on r1 using brin(value);
> CREATE INDEX
> =# set enable_seqscan=0;
> SET
> =# explain select * from r1 where value=555;

I am looking into the issue, I have already reproduced it.  I will
update on this soon.

Thanks for reporting.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

Re: [HACKERS] Parallel Bitmap scans a bit broken

24 matches

Site Navigation

Mail list logo

Footer information