Re: Slow "not in array" operation

2019-11-13 Thread Marco Colli
Wow! Thank you very much Jeff!! I am really grateful. Thanks to the btree (instead of gin) the query now takes about 500ms instead of 70s. Il Mer 13 Nov 2019, 13:18 Jeff Janes ha scritto: > On Wed, Nov 13, 2019 at 6:56 AM Marco Colli > wrote: > >> > the answer is that is because it is a GIN

Re: Slow "not in array" operation

2019-11-13 Thread Jeff Janes
On Wed, Nov 13, 2019 at 6:56 AM Marco Colli wrote: > > the answer is that is because it is a GIN index. Make the same index > only as btree, and you should get good performance as it can filter the > tags within a given project without visiting the table. > > Currently I have this GIN index: >

Re: Slow "not in array" operation

2019-11-13 Thread Marco Colli
> the answer is that is because it is a GIN index. Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table. Currently I have this GIN index: "index_subscriptions_on_project_id_and_tags" gin (project_id,

Re: Slow "not in array" operation

2019-11-13 Thread Jeff Janes
On Wed, Nov 13, 2019 at 4:20 AM Marco Colli wrote: > Replying to the previous questions: > - work_mem = 64MB (there are hundreds of connections) > - the project 123 has more than 7M records, and those that don't have the > tag 'en' are 4.8M > > >> What was the plan for the one that took 500ms? >

Re: Slow "not in array" operation

2019-11-13 Thread Rick Otten
On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx wrote: > Disclaimer: Out over my skis again. > > From what you say here, and over on SO, it sounds like you've got two > problems: > > * Matching on *huge *numbers of records because of common tags. > > * A dynamic collection of tags as they're

Re: Slow "not in array" operation

2019-11-13 Thread Morris de Oryx
Disclaimer: Out over my skis again. >From what you say here, and over on SO, it sounds like you've got two problems: * Matching on *huge *numbers of records because of common tags. * A dynamic collection of tags as they're customer driven/configured. An "ideal" solution might look like a

Re: Slow "not in array" operation

2019-11-13 Thread Marco Colli
Replying to the previous questions: - work_mem = 64MB (there are hundreds of connections) - the project 123 has more than 7M records, and those that don't have the tag 'en' are 4.8M > What was the plan for the one that took 500ms? This is the query / plan without the filter on tags: SELECT

Re: Slow "not in array" operation

2019-11-12 Thread Jeff Janes
> > > 3) Here's the query plan that I get after disabling the seq scan: > > > QUERY PLAN > > > > ---

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
I am not a PostgreSQL expert, however I think that the following algorithm should be possible and fast: 1. find the bitmap of all subscriptions in a project that are not trashed (it can use the index and takes only ~500ms) 2. find the bitmap of all subscriptions that match the above condition and

Re: Slow "not in array" operation

2019-11-12 Thread Tom Lane
Marco Colli writes: > 3) Here's the query plan that I get after disabling the seq scan: > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual > time=94972.253..94972.254 rows=1 loops=1) So, this is slower than the seqscan, which means the planner made the right choice.

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
Odd index choice by the optimizer given what is available. The bitmap being lossy means more work_mem is needed if I remember properly. It is interesting that skipping the where condition on the array is only half a second. Is the array being toasted or is it small and being stored in the same

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated hyperthreads (16 cores) and SSD. SHOW random_page_cost; => 2 2) What config names should I check exactly? I used some suggestions from the online PGTune, when I first configured the db some months ago: max_worker_processes =

Re: Slow "not in array" operation

2019-11-12 Thread Justin Pryzby
On Tue, Nov 12, 2019 at 12:20:10PM -0700, Michael Lewis wrote: > It is very interesting to me that the optimizer chose a parallel sequential > scan rather than an index scan on either of your indexes that start > with project_id that also reference trashed_at. Maybe because of low correlation on

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
It is very interesting to me that the optimizer chose a parallel sequential scan rather than an index scan on either of your indexes that start with project_id that also reference trashed_at. 1) Are you running on SSD type storage? Has random_page_cost been lowered to 1-1.5 or so (close to 1

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
To be honest, I have simplified the question above. In order to show you the plan, I must show you the actual query, which is this: === QUERY === SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @>

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
What's the plan for the slow one? What's the time to just count all rows? >

Slow "not in array" operation

2019-11-12 Thread Marco Colli
I have a large table with millions of rows. Each row has an array field "tags". I also have the proper GIN index on tags. Counting the rows that have a tag is fast (~7s): SELECT COUNT(*) FROM "subscriptions" WHERE (tags @> ARRAY['t1']::varchar[]); However counting the rows that don't have a tag