OK, this is *untested* but my thoughts are below. It should be
minimally disruptive to the source code but because it adds a parameter
with a default (for the existing behaviour) to IntegerSet::csl, it does
mean it touches most files during a compile.
I need to get something I can test this in as I already have changes in
my development tree to query.cpp which are not complete, so I cannot use
my current test environment until I fork what I have.
If you have a test environment you can run this in, please let me know
as otherwise it will take me a few days.
Happy to discuss if that helps.
Jim
--- ../archiveopteryx-3.2.0/db/query.cpp 2014-03-10
09:44:59.000000000 -0400
+++ db/query.cpp 2015-05-09 11:07:44.619422000 -0400
@@ -317,9 +317,10 @@
void Query::bind( uint n, const class IntegerSet & set )
{
if ( d->format == Text ) {
- EString s( "{" );
- s.append( set.csl() );
- s.append( "}" );
+ // PostgreSQL Is More Efficient By Using A
VALUES(x),VALUES(y)...VALUES(z)
+ // List Instead Of An Array List "{x,y,...,z}"
+ // See
https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
+ EString s( "" );
+ s.append( set.csl(true) );
bind( n, s );
}
else {
@@ -855,6 +856,19 @@
return c->bi;
}
--- ../archiveopteryx-3.2.0/server/integerset.cpp 2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.cpp 2015-05-09 11:08:33.355422000 -0400
@@ -542,7 +542,7 @@
decimal numbers.
*/
-EString IntegerSet::csl() const
+EString IntegerSet::csl(bool asvalues=false) const
{
EString r;
r.reserve( 2222 );
@@ -558,7 +558,13 @@
if ( b & ( 1 << j ) ) {
if ( !r.isEmpty() )
r.append( ',' );
+ if (asvalues)
+ r.append( "VALUES(");
+
r.appendNumber( it->start + n * BitsPerUint + j );
+
+ if (asvalues)
+ r.append( ')');
}
j++;
}
--- ../archiveopteryx-3.2.0/server/integerset.h 2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.h 2015-05-09 11:07:08.567422000 -0400
@@ -27,7 +27,7 @@
uint index( uint ) const;
EString set() const;
- EString csl() const;
+ EString csl(bool) const;
void add( uint, uint );
void add( uint n ) { add( n, n ); }
On 09/05/2015 15:58, NSS Ltd wrote:
> Arnt,
>
> >From a quick look, these queries use the IntegerSet bind from
> db/query.cpp; if you've not made a patch already, I can probably roll
> something fairly quickly to test this with as it looks like it could be
> done as a self-contained patch without affecting anything else.
>
> Let me know.
>
> Jim
>
>
> On 09/05/2015 14:50, Martin Rode wrote:
>> Hi Arnt & all
>>
>> I digged into this a little further. Here are my oberservations:
>>
>> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
>> takes here about 10m to complete. Result Set is about 360k rows.
>>
>> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
>> reason I reckoned. But after setting some postgresql.conf settings
>> (work_mem & friends, thx Axel) I still got bad query times.
>>
>> Query looks like this:
>>
>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>>
>> I tried optimizing this by adding an index:
>> "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
>> not much improvement.
>>
>> EXPLAIN ANALYZE still looked bad:
>>
>> ---
>> Index Scan using martin_mailbox on mailbox_messages mm
>> (cost=0.00..243326004.21 rows=386398 width=16) (actual
>> time=2.161..1050004.868 rows=397336 loops=1)
>> Index Cond: (mailbox = 441)
>> Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
>> 2.8MB here}'))
>> Total runtime: 1050230.858 ms <- SEE THIS?
>> ---
>>
>> After an optimization I found mentioned here:
>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
>> I changed the query to look like this:
>>
>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>> mm.mailbox=441 and not mm.deleted and
>> mm.uid=any(VALUES(16480),(16481)..)"
>>
>> Apparently the VALUES trick builds a temp table which make the query a
>> lot easier to digest for Postgres.
>>
>> I got much better times (236x faster!):
>>
>> ---
>> Nested Loop (cost=6621.38..6771.33 rows=194822 width=16) (actual
>> time=616.886..4312.135 rows=397336 loops=1)
>> -> HashAggregate (cost=6621.38..6623.38 rows=200 width=4) (actual
>> time=616.693..1010.668 rows=441425 loops=1)
>> -> Values Scan on "*VALUES*" (cost=0.00..5517.81
>> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>> -> Index Scan using martin_mailbox_deleted on mailbox_messages mm
>> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
>> loops=441425)
>> Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>> Total runtime: 4446.153 ms
>> ---
>>
>> I think this easy optimization will do AOX overall good, I see similar
>> queries all the time at our server, averaging at about 2000ms.
>>
>> A patch would be very much welcome :-)
>>
>>