Arnt,

>From a quick look, these queries use the IntegerSet bind from
db/query.cpp; if you've not made a patch already, I can probably roll
something fairly quickly to test this with as it looks like it could be
done as a self-contained patch without affecting anything else.

Let me know.

Jim


On 09/05/2015 14:50, Martin Rode wrote:
> Hi Arnt & all
>
> I digged into this a little further. Here are my oberservations:
>
> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
> takes here about 10m to complete. Result Set is  about 360k rows.
>
> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
> reason I reckoned. But after setting some postgresql.conf settings
> (work_mem & friends, thx Axel) I still got bad query times.
>
> Query looks like this:
>
> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>
> I tried optimizing this by adding an index:
>     "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
> not much improvement.
>
> EXPLAIN ANALYZE still looked bad:
>
> ---
>  Index Scan using martin_mailbox on mailbox_messages mm
> (cost=0.00..243326004.21 rows=386398 width=16) (actual
> time=2.161..1050004.868 rows=397336 loops=1)
>    Index Cond: (mailbox = 441)
>    Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
> 2.8MB here}'))
>  Total runtime: 1050230.858 ms  <- SEE THIS?
> ---
>
> After an optimization I found mentioned here:
> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
> I changed the query to look like this:
>
> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
> mm.mailbox=441 and not mm.deleted and
> mm.uid=any(VALUES(16480),(16481)..)"
>
> Apparently the VALUES trick builds a temp table which make the query a
> lot easier to digest for Postgres.
>
> I got much better times (236x faster!):
>
> ---
> Nested Loop  (cost=6621.38..6771.33 rows=194822 width=16) (actual
> time=616.886..4312.135 rows=397336 loops=1)
>    ->  HashAggregate  (cost=6621.38..6623.38 rows=200 width=4) (actual
> time=616.693..1010.668 rows=441425 loops=1)
>          ->  Values Scan on "*VALUES*"  (cost=0.00..5517.81
> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>    ->  Index Scan using martin_mailbox_deleted on mailbox_messages mm 
> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
> loops=441425)
>          Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>  Total runtime: 4446.153 ms
> ---
>
> I think this easy optimization will do AOX overall good, I see similar
> queries all the time at our server, averaging at about 2000ms.
>
> A patch would be very much welcome :-)
>
>

Reply via email to