Wow! Thank you very much Jeff!! I am really grateful.
Thanks to the btree (instead of gin) the query now takes about 500ms
instead of 70s.
Il Mer 13 Nov 2019, 13:18 Jeff Janes ha scritto:
> On Wed, Nov 13, 2019 at 6:56 AM Marco Colli
> wrote:
>
>> > the answer is that is because it is a GIN
On Wed, Nov 13, 2019 at 6:56 AM Marco Colli wrote:
> > the answer is that is because it is a GIN index. Make the same index
> only as btree, and you should get good performance as it can filter the
> tags within a given project without visiting the table.
>
> Currently I have this GIN index:
>
> the answer is that is because it is a GIN index. Make the same index only
as btree, and you should get good performance as it can filter the tags
within a given project without visiting the table.
Currently I have this GIN index:
"index_subscriptions_on_project_id_and_tags" gin (project_id,
On Wed, Nov 13, 2019 at 4:20 AM Marco Colli wrote:
> Replying to the previous questions:
> - work_mem = 64MB (there are hundreds of connections)
> - the project 123 has more than 7M records, and those that don't have the
> tag 'en' are 4.8M
>
>
>> What was the plan for the one that took 500ms?
>
On Wed, Nov 13, 2019 at 5:47 AM Morris de Oryx
wrote:
> Disclaimer: Out over my skis again.
>
> From what you say here, and over on SO, it sounds like you've got two
> problems:
>
> * Matching on *huge *numbers of records because of common tags.
>
> * A dynamic collection of tags as they're
Disclaimer: Out over my skis again.
>From what you say here, and over on SO, it sounds like you've got two
problems:
* Matching on *huge *numbers of records because of common tags.
* A dynamic collection of tags as they're customer driven/configured.
An "ideal" solution might look like a
Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the
tag 'en' are 4.8M
> What was the plan for the one that took 500ms?
This is the query / plan without the filter on tags:
SELECT
>
>
> 3) Here's the query plan that I get after disabling the seq scan:
>
>
> QUERY PLAN
>
>
>
> ---
I am not a PostgreSQL expert, however I think that the following
algorithm should be possible and fast:
1. find the bitmap of all subscriptions in a project that are not trashed
(it can use the index and takes only ~500ms)
2. find the bitmap of all subscriptions that match the above condition and
Marco Colli writes:
> 3) Here's the query plan that I get after disabling the seq scan:
> Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> time=94972.253..94972.254 rows=1 loops=1)
So, this is slower than the seqscan, which means the planner made the
right choice.
Odd index choice by the optimizer given what is available. The bitmap being
lossy means more work_mem is needed if I remember properly.
It is interesting that skipping the where condition on the array is only
half a second. Is the array being toasted or is it small and being stored
in the same
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated
hyperthreads (16 cores) and SSD.
SHOW random_page_cost; => 2
2) What config names should I check exactly? I used some suggestions from
the online PGTune, when I first configured the db some months ago:
max_worker_processes =
On Tue, Nov 12, 2019 at 12:20:10PM -0700, Michael Lewis wrote:
> It is very interesting to me that the optimizer chose a parallel sequential
> scan rather than an index scan on either of your indexes that start
> with project_id that also reference trashed_at.
Maybe because of low correlation on
It is very interesting to me that the optimizer chose a parallel sequential
scan rather than an index scan on either of your indexes that start
with project_id that also reference trashed_at.
1) Are you running on SSD type storage? Has random_page_cost been lowered
to 1-1.5 or so (close to 1
To be honest, I have simplified the question above. In order to show you
the plan, I must show you the actual query, which is this:
=== QUERY ===
SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @>
What's the plan for the slow one? What's the time to just count all rows?
>
I have a large table with millions of rows. Each row has an array field
"tags". I also have the proper GIN index on tags.
Counting the rows that have a tag is fast (~7s):
SELECT COUNT(*) FROM "subscriptions" WHERE (tags @> ARRAY['t1']::varchar[]);
However counting the rows that don't have a tag
17 matches
Mail list logo