Re: certain aox queries run wild and stay forever

NSS Ltd Sat, 09 May 2015 08:37:13 -0700

Oops, just checked and I think default values are in the declaration and
not the implementation (I suspect gcc probably fixes it anyway).  So
patch for IntegerSet should be :


--- ../archiveopteryx-3.2.0/server/integerset.h    2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.h    2015-05-09 11:32:54.719422000 -0400
@@ -27,7 +27,7 @@
     uint index( uint ) const;
 
     EString set() const;
-    EString csl() const;
+    EString csl(bool asvalues = false) const;
 
     void add( uint, uint );
     void add( uint n ) { add( n, n ); }

--- ../archiveopteryx-3.2.0/server/integerset.cpp    2014-03-10
09:44:59.000000000 -0400
+++ server/integerset.cpp    2015-05-09 11:32:45.027422000 -0400
@@ -542,7 +542,7 @@
     decimal numbers.
 */
 
-EString IntegerSet::csl() const
+EString IntegerSet::csl(bool asvalues) const
 {
     EString r;
     r.reserve( 2222 );
@@ -558,7 +558,13 @@
                     if ( b & ( 1 << j ) ) {
                         if ( !r.isEmpty() )
                             r.append( ',' );
+                        if (asvalues)
+                            r.append( "VALUES(");
+                           
                         r.appendNumber( it->start + n * BitsPerUint + j );
+                       
+                        if (asvalues)
+                            r.append( ')');
                     }
                     j++;
                 }


On 09/05/2015 16:22, NSS Ltd wrote:
> OK, this is *untested* but my thoughts are below.  It should be
> minimally disruptive to the source code but because it adds a parameter
> with a default (for the existing behaviour) to IntegerSet::csl, it does
> mean it touches most files during a compile.
>
> I need to get something I can test this in as I already have changes in
> my development tree to query.cpp which are not complete, so I cannot use
> my current test environment until I fork what I have.
>
> If you have a test environment you can run this in, please let me know
> as otherwise it will take me a few days.
>
> Happy to discuss if that helps.
>
> Jim
>
>
> --- ../archiveopteryx-3.2.0/db/query.cpp    2014-03-10
> 09:44:59.000000000 -0400
> +++ db/query.cpp    2015-05-09 11:07:44.619422000 -0400
> @@ -317,9 +317,10 @@
>  void Query::bind( uint n, const class IntegerSet & set )
>  {
>      if ( d->format == Text ) {
> -        EString s( "{" );
> -        s.append( set.csl() );
> -        s.append( "}" );
> +        // PostgreSQL Is More Efficient By Using A
> VALUES(x),VALUES(y)...VALUES(z)
> +        // List Instead Of An Array List "{x,y,...,z}"
> +        // See
> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
> +        EString s( "" );
> +        s.append( set.csl(true) );
>          bind( n, s );
>      }
>      else {
> @@ -855,6 +856,19 @@
>      return c->bi;
>  }
>  
> --- ../archiveopteryx-3.2.0/server/integerset.cpp    2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.cpp    2015-05-09 11:08:33.355422000 -0400
> @@ -542,7 +542,7 @@
>      decimal numbers.
>  */
>  
> -EString IntegerSet::csl() const
> +EString IntegerSet::csl(bool asvalues=false) const
>  {
>      EString r;
>      r.reserve( 2222 );
> @@ -558,7 +558,13 @@
>                      if ( b & ( 1 << j ) ) {
>                          if ( !r.isEmpty() )
>                              r.append( ',' );
> +                        if (asvalues)
> +                            r.append( "VALUES(");
> +                           
>                          r.appendNumber( it->start + n * BitsPerUint + j );
> +                       
> +                        if (asvalues)
> +                            r.append( ')');
>                      }
>                      j++;
>                  }
>
> --- ../archiveopteryx-3.2.0/server/integerset.h    2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.h    2015-05-09 11:07:08.567422000 -0400
> @@ -27,7 +27,7 @@
>      uint index( uint ) const;
>  
>      EString set() const;
> -    EString csl() const;
> +    EString csl(bool) const;
>  
>      void add( uint, uint );
>      void add( uint n ) { add( n, n ); }
>
>
> On 09/05/2015 15:58, NSS Ltd wrote:
>> Arnt,
>>
>> >From a quick look, these queries use the IntegerSet bind from
>> db/query.cpp; if you've not made a patch already, I can probably roll
>> something fairly quickly to test this with as it looks like it could be
>> done as a self-contained patch without affecting anything else.
>>
>> Let me know.
>>
>> Jim
>>
>>
>> On 09/05/2015 14:50, Martin Rode wrote:
>>> Hi Arnt & all
>>>
>>> I digged into this a little further. Here are my oberservations:
>>>
>>> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
>>> takes here about 10m to complete. Result Set is  about 360k rows.
>>>
>>> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
>>> reason I reckoned. But after setting some postgresql.conf settings
>>> (work_mem & friends, thx Axel) I still got bad query times.
>>>
>>> Query looks like this:
>>>
>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>>>
>>> I tried optimizing this by adding an index:
>>>     "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
>>> not much improvement.
>>>
>>> EXPLAIN ANALYZE still looked bad:
>>>
>>> ---
>>>  Index Scan using martin_mailbox on mailbox_messages mm
>>> (cost=0.00..243326004.21 rows=386398 width=16) (actual
>>> time=2.161..1050004.868 rows=397336 loops=1)
>>>    Index Cond: (mailbox = 441)
>>>    Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
>>> 2.8MB here}'))
>>>  Total runtime: 1050230.858 ms  <- SEE THIS?
>>> ---
>>>
>>> After an optimization I found mentioned here:
>>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
>>> I changed the query to look like this:
>>>
>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>> mm.mailbox=441 and not mm.deleted and
>>> mm.uid=any(VALUES(16480),(16481)..)"
>>>
>>> Apparently the VALUES trick builds a temp table which make the query a
>>> lot easier to digest for Postgres.
>>>
>>> I got much better times (236x faster!):
>>>
>>> ---
>>> Nested Loop  (cost=6621.38..6771.33 rows=194822 width=16) (actual
>>> time=616.886..4312.135 rows=397336 loops=1)
>>>    ->  HashAggregate  (cost=6621.38..6623.38 rows=200 width=4) (actual
>>> time=616.693..1010.668 rows=441425 loops=1)
>>>          ->  Values Scan on "*VALUES*"  (cost=0.00..5517.81
>>> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>>>    ->  Index Scan using martin_mailbox_deleted on mailbox_messages mm 
>>> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
>>> loops=441425)
>>>          Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>>>  Total runtime: 4446.153 ms
>>> ---
>>>
>>> I think this easy optimization will do AOX overall good, I see similar
>>> queries all the time at our server, averaging at about 2000ms.
>>>
>>> A patch would be very much welcome :-)
>>>
>>>

Re: certain aox queries run wild and stay forever

Reply via email to