Re: [External] Join queries slow with predicate, limit, and ordering

Aufar Gilbran Tue, 03 Dec 2019 00:52:50 -0800

Thanks for the answer!

On Tue, Dec 3, 2019 at 8:39 AM Jeff Janes <[email protected]> wrote:
> What happens if you set enable_sort to off before running it?


Turning enable_sort to off makes the first query to not sort[1]. It
does run much slower though compared to the original query[2]. This
time I do VACUUM ANALYZE first so even the slow query is much faster,
but still much slower than the fast query[3].

> It thinks it will find 1 row, and actually finds 89,222.  I don't know 
> exactly why that would be, I suppose tag_id has an extremely skewed 
> distribution.  But yeah, that is going to cause some problems.  For one 
> thing, if there was actually just one qualifying row, then it wouldn't get to 
> stop early, as the LIMIT would never be satisfied.  So it thinks that if it 
> choose to walk the index backwards, it would have to walk the **entire** 
> index.

I'm not really sure what skewed distribution is. If by skewed you mean
that for a particular tag_id there are many entity and other tag_id
there might be low amount entity then yes, this particular key value
covers 80% of the entity. For this kind of dataset, is there any way
that I can do to improve it or is it just impossible?

> With this query, it can use the join condition to transfer the knowledge of 
> tag.id=24 to become entity_tag.tag_id=24, and then look up stats on 
> entity_tag.tag_id for the value 24.  When you specify the single row of tag 
> indirectly, it can't do that as it doesn't know what specific value of tag.id 
> is going to be the one it finds (until after the query is done being planned 
> and starts executing, at which point it is too late).  But the row with id=24 
> doesn't seem to be the same one with "tag.key = 'status' AND tag.value = 
> 'SUCCEEDED'", so you have basically changed the query entirely on us.

Apologies, I used the query for database on another environment
previously. The correct one uses tag_id=18 [3]. So it becomes like
this:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 18
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It's still very fast and the query plan looks similar to me.

> If you replanned this query with ORDER BY entity.id+0 DESC, (and with the 
> true value of tag_id) that might give you some more insight into the hidden 
> "thought process" behind the planner.

I tried this on the fast query and it becomes very slow [4]. I guess
because it cannot consult the index for the ordering anymore so it
can't do LIMIT? I'm not so sure.

[1] https://explain.depesz.com/s/aEmR
[2] https://explain.depesz.com/s/kmNY
[3] https://explain.depesz.com/s/pD5v
[4] https://explain.depesz.com/s/4s7Q

--
Best regards,

Aufar Gilbran

-- 
*_Grab is hiring. Learn more at _**https://grab.careers 
<https://grab.careers/>*


By communicating with Grab Inc and/or its 
subsidiaries, associate companies and jointly controlled entities (“Grab 
Group”), you are deemed to have consented to the processing of your 
personal data as set out in the Privacy Notice which can be viewed at 
https://grab.com/privacy/ <https://grab.com/privacy/>


This email contains 
confidential information and is only for the intended recipient(s). If you 
are not the intended recipient(s), please do not disseminate, distribute or 
copy this email Please notify Grab Group immediately if you have received 
this by mistake and delete this email from your system. Email transmission 
cannot be guaranteed to be secure or error-free as any information therein 
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or 
contain viruses. Grab Group do not accept liability for any errors or 
omissions in the contents of this email arises as a result of email 
transmission. All intellectual property rights in this email and 
attachments therein shall remain vested in Grab Group, unless otherwise 
provided by law.

Re: [External] Join queries slow with predicate, limit, and ordering

Reply via email to