Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-05 Thread Marc-Olaf Jaschke
Thanks for the explanation!

Best Regards,
Marc-Olaf

Marc-Olaf Jaschke · Softwareentwickler
shopping24 GmbH

Werner-Otto-Straße 1-7 · 22179 Hamburg
Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879
marc-olaf.jasc...@s24.com · www.s24.com
AG Hamburg HRB 63371
vertreten durch Dr. Björn Schäfers und Martin Mildner

2016-12-05 3:28 GMT+01:00 Jeff Janes <jeff.ja...@gmail.com>:

>
> > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
>
>> I wonder why bitmap heap scan adds such a big amount of time on top of
>> the plain bitmap index scan.
>> It seems to me, that the recheck is active although all blocks are exact
>> [1] and that pg is loading the jsonb for the recheck.
>>
>> Is this an expected behavior?
>>
>
>
> Yes, this is expected.  The gin index is lossy.  It knows that all the
> elements are present (except when it doesn't--large elements might get
> hashed down and suffer hash collisions), but it doesn't know what the
> recursive structure between them is, and has to do a recheck.
>
> For example, if you change your example where clause to:
>
> big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> You will see that the index still returns 50,000 rows, but now all of them
> get rejected upon the recheck.
>
> You could try changing the type of index to jsonb_path_ops.  In your given
> example, it won't make a difference, because you are actually counting half
> the table and so half the table needs to be rechecked.  But in my example,
> jsonb_path_ops successfully rejects all the rows at the index stage.
>
> Cheers,
>
> Jeff
>


[PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-11-30 Thread Marc-Olaf Jaschke
Hi,

i have a performance issue with bitmap index scans on huge amounts of big jsonb 
documents.


= Background = 

- table with big jsonb documents
- gin index on these documents
- queries using index conditions with low selectivity


= Example = 

select version(); 
> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
> (Red Hat 4.4.7-17), 64-bit

show work_mem;
> 1GB

-- setup test data
create table bitmap_scan_test as
select
i,
(select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from 
generate_series(0, 100) j) big_jsonb
from
generate_series(0, 10) i;

create index on bitmap_scan_test using gin (big_jsonb);

analyze bitmap_scan_test;


--  query with bitmap scan
explain analyze
select
count(*)
from
bitmap_scan_test
where
big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';

Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
rows=1 loops=1)
  ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
width=0) (actual time=16.496..617.431 rows=5 loops=1)
Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Heap Blocks: exact=637
->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
(cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
loops=1)
  Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Planning time: 0.106 ms
Execution time: 622.334 ms


perf top -p... shows heavy usage of pglz_decompress:

Overhead  Shared Object Symbol
  51,06%  postgres  [.] pglz_decompress
   7,33%  libc-2.12.so  [.] memcpy
...

= End of example =  


I wonder why bitmap heap scan adds such a big amount of time on top of the 
plain bitmap index scan. 
It seems to me, that the recheck is active although all blocks are exact [1] 
and that pg is loading the jsonb for the recheck.

Is this an expected behavior?


Regards,
Marc-Olaf


[1] 
(http://dba.stackexchange.com/questions/106264/recheck-cond-line-in-query-plans-with-a-bitmap-index-scan)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance