On Sun, Feb 26, 2017 at 4:28 PM, Sven R. Kunze <srku...@mail.de> wrote:
> Hello everyone, > > I am currently evaluating the possibility of using PostgreSQL for storing > and querying jsonb+tsvector queries. Let's consider this setup: > > create table docs (id serial primary key, meta jsonb); > # generate 10M entries, cf. appendix > create index docs_meta_idx ON docs using gin (meta jsonb_path_ops); > create index docs_name_idx ON docs using gin (to_tsvector('english', > meta->>'name')); > create index docs_address_idx ON docs using gin (to_tsvector('english', > meta->>'address')); > > functional index tends to be slow, better use separate column(s) for tsvector > > Testing around with some smaller datasets, functionality-wise it's great. > However increasing to 10M, things tend to slow down (using PostgreSQL 9.5): > > > explain analyze select id from docs where meta @> '{"age": 20}'; > Planning time: 0.121 ms > Execution time: 4873.507 ms > > explain analyze select id from docs where meta @> '{"age": 20}'; > Planning time: 0.122 ms > Execution time: 206.289 ms > > > > explain analyze select id from docs where meta @> '{"age": 30}'; > Planning time: 0.109 ms > Execution time: 7496.886 ms > > explain analyze select id from docs where meta @> '{"age": 30}'; > Planning time: 0.114 ms > Execution time: 1169.649 ms > > > > explain analyze select id from docs where to_tsvector('english', > meta->>'name') @@ to_tsquery('english', 'john'); > Planning time: 0.179 ms > Execution time: 10109.375 ms > > explain analyze select id from docs where to_tsvector('english', > meta->>'name') @@ to_tsquery('english', 'john'); > Planning time: 0.188 ms > Execution time: 238.854 ms > what is full output from explain analyze ? > > > Using "select pg_prewarm('docs');" and on any of the indexes doesn't help > either. > After a "systemctl stop postgresql.service && sync && echo 3 > > /proc/sys/vm/drop_caches && systemctl start postgresql.service" the age=20, > 30 or name=john queries are slow again. > > > Is there a way to speed up or to warm up things permanently? > > > Regards, > Sven > > > Appendix I: > > example json: > > {"age": 20, "name": "Michelle Hernandez", "birth": "1991-08-16", > "address": "94753 Tina Bridge Suite 318\\nEmilyport, MT 75302"} > > > > Appendix II: > > > The Python script to generate fake json data. Needs "pip install faker". > > >>> python fake_json.py > test.json # generates 2M entries; takes some > time > >>> cat test.json | psql -c 'copy docs (meta) from stdin' > >>> cat test.json | psql -c 'copy docs (meta) from stdin' > >>> cat test.json | psql -c 'copy docs (meta) from stdin' > >>> cat test.json | psql -c 'copy docs (meta) from stdin' > >>> cat test.json | psql -c 'copy docs (meta) from stdin' > > > -- fake_json.py -- > > import faker, json; > fake = faker.Faker(); > for i in range(2*10**6): > print(json.dumps({"name": fake.name(), "birth": fake.date(), > "address": fake.address(), "age": fake.random_int(0,100)}).replace('\\n', > '\\\\n')) > >