OK, this is *untested* but my thoughts are below.  It should be
minimally disruptive to the source code but because it adds a parameter
with a default (for the existing behaviour) to IntegerSet::csl, it does
mean it touches most files during a compile.

I need to get something I can test this in as I already have changes in
my development tree to query.cpp which are not complete, so I cannot use
my current test environment until I fork what I have.

If you have a test environment you can run this in, please let me know
as otherwise it will take me a few days.

Happy to discuss if that helps.

Jim


--- ../archiveopteryx-3.2.0/db/query.cpp    2014-03-10
09:44:59.000000000 -0400
+++ db/query.cpp    2015-05-09 11:07:44.619422000 -0400
@@ -317,9 +317,10 @@
 void Query::bind( uint n, const class IntegerSet & set )
 {
     if ( d->format == Text ) {
-        EString s( "{" );
-        s.append( set.csl() );
-        s.append( "}" );
+        // PostgreSQL Is More Efficient By Using A
VALUES(x),VALUES(y)...VALUES(z)
+        // List Instead Of An Array List "{x,y,...,z}"
+        // See
https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
+        EString s( "" );
+        s.append( set.csl(true) );
         bind( n, s );
     }
     else {
@@ -855,6 +856,19 @@
     return c->bi;
 }
 
--- ../archiveopteryx-3.2.0/server/integerset.cpp    2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.cpp    2015-05-09 11:08:33.355422000 -0400
@@ -542,7 +542,7 @@
     decimal numbers.
 */
 
-EString IntegerSet::csl() const
+EString IntegerSet::csl(bool asvalues=false) const
 {
     EString r;
     r.reserve( 2222 );
@@ -558,7 +558,13 @@
                     if ( b & ( 1 << j ) ) {
                         if ( !r.isEmpty() )
                             r.append( ',' );
+                        if (asvalues)
+                            r.append( "VALUES(");
+                           
                         r.appendNumber( it->start + n * BitsPerUint + j );
+                       
+                        if (asvalues)
+                            r.append( ')');
                     }
                     j++;
                 }

--- ../archiveopteryx-3.2.0/server/integerset.h    2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.h    2015-05-09 11:07:08.567422000 -0400
@@ -27,7 +27,7 @@
     uint index( uint ) const;
 
     EString set() const;
-    EString csl() const;
+    EString csl(bool) const;
 
     void add( uint, uint );
     void add( uint n ) { add( n, n ); }


On 09/05/2015 15:58, NSS Ltd wrote:
> Arnt,
>
> >From a quick look, these queries use the IntegerSet bind from
> db/query.cpp; if you've not made a patch already, I can probably roll
> something fairly quickly to test this with as it looks like it could be
> done as a self-contained patch without affecting anything else.
>
> Let me know.
>
> Jim
>
>
> On 09/05/2015 14:50, Martin Rode wrote:
>> Hi Arnt & all
>>
>> I digged into this a little further. Here are my oberservations:
>>
>> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
>> takes here about 10m to complete. Result Set is  about 360k rows.
>>
>> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
>> reason I reckoned. But after setting some postgresql.conf settings
>> (work_mem & friends, thx Axel) I still got bad query times.
>>
>> Query looks like this:
>>
>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>>
>> I tried optimizing this by adding an index:
>>     "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
>> not much improvement.
>>
>> EXPLAIN ANALYZE still looked bad:
>>
>> ---
>>  Index Scan using martin_mailbox on mailbox_messages mm
>> (cost=0.00..243326004.21 rows=386398 width=16) (actual
>> time=2.161..1050004.868 rows=397336 loops=1)
>>    Index Cond: (mailbox = 441)
>>    Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
>> 2.8MB here}'))
>>  Total runtime: 1050230.858 ms  <- SEE THIS?
>> ---
>>
>> After an optimization I found mentioned here:
>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
>> I changed the query to look like this:
>>
>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>> mm.mailbox=441 and not mm.deleted and
>> mm.uid=any(VALUES(16480),(16481)..)"
>>
>> Apparently the VALUES trick builds a temp table which make the query a
>> lot easier to digest for Postgres.
>>
>> I got much better times (236x faster!):
>>
>> ---
>> Nested Loop  (cost=6621.38..6771.33 rows=194822 width=16) (actual
>> time=616.886..4312.135 rows=397336 loops=1)
>>    ->  HashAggregate  (cost=6621.38..6623.38 rows=200 width=4) (actual
>> time=616.693..1010.668 rows=441425 loops=1)
>>          ->  Values Scan on "*VALUES*"  (cost=0.00..5517.81
>> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>>    ->  Index Scan using martin_mailbox_deleted on mailbox_messages mm 
>> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
>> loops=441425)
>>          Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>>  Total runtime: 4446.153 ms
>> ---
>>
>> I think this easy optimization will do AOX overall good, I see similar
>> queries all the time at our server, averaging at about 2000ms.
>>
>> A patch would be very much welcome :-)
>>
>>

Reply via email to