Of course, I do not create GIN index.
Maybe the problem is related to checkpoint and WAL.
I don't know how to make the comparison with MongoDB fair enough.





(C)Bitnine, Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


2016-07-19 11:23 GMT+09:00 Sameer Kumar <sameer.ku...@ashnik.com>:

>
>
> On Fri, 11 Mar 2016, 9:39 p.m. Paul Jones, <p...@cmicdo.com> wrote:
>
>> I have been running the EDB benchmark that compares Postgres and MongoDB.
>> I believe EDB ran it against PG 9.4 and Mongo 2.6.  I am running it
>> against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000
>> JSON records generated by the benchmark.  It looks like Mongo is winning,
>> and apparently because of its cache management.
>>
>> The first queries on both run in ~30 min.  And, once PG fills its cache,
>> it whips Mongo on repeats of the *same* query (vmstat shows no disk
>> reads for PG).
>>
>> However, when different query on the same table is issued to both,
>> vmstat shows that PG has to read the *entire* table again, and it takes
>> ~30 min.  Mongo does a lot of reads initially but after about 5 minutes,
>> it stops reading and completes the query, most likely because it is
>> using its cache very effectively.
>>
>> Host:   Virtual Machine
>>         4 CPUs
>>         16 Gb RAM
>>         200 Gb Disk
>>         RHEL 6.6
>>
>> PG:     9.5.1 compiled from source
>>         shared_buffers = 7GB
>>         effectve_cache_size = 12GB
>>
>> Mongo:  3.2 installed with RPM from Mongo
>>
>> In PG, I created the table by:
>>
>> CREATE TABLE json_tables
>> (
>>         data    JSONB
>> );
>>
>> After loading, it creates the index:
>>
>> CREATE INDEX json_tables_idx ON json_tables USING GIN (data
>> jsonb_path_ops);
>>
>
> This would create one GIN index which is going to be a bit larger than
> usual btree /n-tree index on a specific JSON field. And would be slower
> too. I suggest that you create an index on the specific expression using
> JSON operators. In my opinion that index would be much more nearer to
> mongoDB indexes.
>
>
>
>> After a lot of experimentation, I discovered that the benchmark was not
>> using PG's index, so I modified the four queries to be:
>>
>> SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';
>> SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic
>> Plan"}';
>> SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';
>> SELECT data FROM json_tables WHERE data @> '{"type": "service"}';
>>
>> Here are two consecutive explain analyze for PG, for the same query.
>> No functional difference in the plans that I can tell, but the effect
>> of PG's cache on the second is dramatic.
>>
>> If anyone has ideas on how I can get PG to more effectively use the cache
>> for subsequent queries, I would love to hear them.
>>
>> -------
>>
>> benchmark=# explain analyze select data from json_tables where data @>
>> '{"name": "AC3 Case Red"}';
>>
>>                                                                QUERY PLAN
>>
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------
>>
>> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000
>> width=1261)
>> (actual time=2157.118..1259550.327 rows=909091 loops=1)
>>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>>    Rows Removed by Index Recheck: 4360296
>>    Heap Blocks: exact=37031 lossy=872059
>>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00
>> rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)
>>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>> Planning time: 291.932 ms
>> Execution time: 1259886.920 ms
>> (8 rows)
>>
>> Time: 1261191.844 ms
>>
>> benchmark=# explain analyze select data from json_tables where data @>
>> '{"name": "AC3 Case Red"}';
>>                                                               QUERY PLAN
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------
>>
>> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000
>> width=1261) (actual time=779.261..29815.262 rows=909091 loops=1)
>>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>>    Rows Removed by Index Recheck: 4360296
>>    Heap Blocks: exact=37031 lossy=872059
>>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00
>> rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1)
>>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>> Planning time: 33.967 ms
>> Execution time: 29869.381 ms
>>
>> (8 rows)
>>
>> Time: 29987.122 ms
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
>

Reply via email to