Re: certain aox queries run wild and stay forever

NSS Ltd Sat, 09 May 2015 09:16:40 -0700

Managed to squeeze this into a server for a quick test and the binding
causes a problem, so this is not going to work as-is.  I'll take another
look but the patch will be more complex and take a bit more time.


Jim


On 09/05/2015 16:35, NSS Ltd wrote:
> Oops, just checked and I think default values are in the declaration and
> not the implementation (I suspect gcc probably fixes it anyway).  So
> patch for IntegerSet should be :
>
> --- ../archiveopteryx-3.2.0/server/integerset.h    2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.h    2015-05-09 11:32:54.719422000 -0400
> @@ -27,7 +27,7 @@
>      uint index( uint ) const;
>  
>      EString set() const;
> -    EString csl() const;
> +    EString csl(bool asvalues = false) const;
>  
>      void add( uint, uint );
>      void add( uint n ) { add( n, n ); }
>
> --- ../archiveopteryx-3.2.0/server/integerset.cpp    2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.cpp    2015-05-09 11:32:45.027422000 -0400
> @@ -542,7 +542,7 @@
>      decimal numbers.
>  */
>  
> -EString IntegerSet::csl() const
> +EString IntegerSet::csl(bool asvalues) const
>  {
>      EString r;
>      r.reserve( 2222 );
> @@ -558,7 +558,13 @@
>                      if ( b & ( 1 << j ) ) {
>                          if ( !r.isEmpty() )
>                              r.append( ',' );
> +                        if (asvalues)
> +                            r.append( "VALUES(");
> +                           
>                          r.appendNumber( it->start + n * BitsPerUint + j );
> +                       
> +                        if (asvalues)
> +                            r.append( ')');
>                      }
>                      j++;
>                  }
>
>
> On 09/05/2015 16:22, NSS Ltd wrote:
>> OK, this is *untested* but my thoughts are below.  It should be
>> minimally disruptive to the source code but because it adds a parameter
>> with a default (for the existing behaviour) to IntegerSet::csl, it does
>> mean it touches most files during a compile.
>>
>> I need to get something I can test this in as I already have changes in
>> my development tree to query.cpp which are not complete, so I cannot use
>> my current test environment until I fork what I have.
>>
>> If you have a test environment you can run this in, please let me know
>> as otherwise it will take me a few days.
>>
>> Happy to discuss if that helps.
>>
>> Jim
>>
>>
>> --- ../archiveopteryx-3.2.0/db/query.cpp    2014-03-10
>> 09:44:59.000000000 -0400
>> +++ db/query.cpp    2015-05-09 11:07:44.619422000 -0400
>> @@ -317,9 +317,10 @@
>>  void Query::bind( uint n, const class IntegerSet & set )
>>  {
>>      if ( d->format == Text ) {
>> -        EString s( "{" );
>> -        s.append( set.csl() );
>> -        s.append( "}" );
>> +        // PostgreSQL Is More Efficient By Using A
>> VALUES(x),VALUES(y)...VALUES(z)
>> +        // List Instead Of An Array List "{x,y,...,z}"
>> +        // See
>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
>> +        EString s( "" );
>> +        s.append( set.csl(true) );
>>          bind( n, s );
>>      }
>>      else {
>> @@ -855,6 +856,19 @@
>>      return c->bi;
>>  }
>>  
>> --- ../archiveopteryx-3.2.0/server/integerset.cpp    2014-03-10
>> 09:44:59.000000000 -0400
>> +++ server/integerset.cpp    2015-05-09 11:08:33.355422000 -0400
>> @@ -542,7 +542,7 @@
>>      decimal numbers.
>>  */
>>  
>> -EString IntegerSet::csl() const
>> +EString IntegerSet::csl(bool asvalues=false) const
>>  {
>>      EString r;
>>      r.reserve( 2222 );
>> @@ -558,7 +558,13 @@
>>                      if ( b & ( 1 << j ) ) {
>>                          if ( !r.isEmpty() )
>>                              r.append( ',' );
>> +                        if (asvalues)
>> +                            r.append( "VALUES(");
>> +                           
>>                          r.appendNumber( it->start + n * BitsPerUint + j );
>> +                       
>> +                        if (asvalues)
>> +                            r.append( ')');
>>                      }
>>                      j++;
>>                  }
>>
>> --- ../archiveopteryx-3.2.0/server/integerset.h    2014-03-10
>> 09:44:59.000000000 -0400
>> +++ server/integerset.h    2015-05-09 11:07:08.567422000 -0400
>> @@ -27,7 +27,7 @@
>>      uint index( uint ) const;
>>  
>>      EString set() const;
>> -    EString csl() const;
>> +    EString csl(bool) const;
>>  
>>      void add( uint, uint );
>>      void add( uint n ) { add( n, n ); }
>>
>>
>> On 09/05/2015 15:58, NSS Ltd wrote:
>>> Arnt,
>>>
>>> >From a quick look, these queries use the IntegerSet bind from
>>> db/query.cpp; if you've not made a patch already, I can probably roll
>>> something fairly quickly to test this with as it looks like it could be
>>> done as a self-contained patch without affecting anything else.
>>>
>>> Let me know.
>>>
>>> Jim
>>>
>>>
>>> On 09/05/2015 14:50, Martin Rode wrote:
>>>> Hi Arnt & all
>>>>
>>>> I digged into this a little further. Here are my oberservations:
>>>>
>>>> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
>>>> takes here about 10m to complete. Result Set is  about 360k rows.
>>>>
>>>> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
>>>> reason I reckoned. But after setting some postgresql.conf settings
>>>> (work_mem & friends, thx Axel) I still got bad query times.
>>>>
>>>> Query looks like this:
>>>>
>>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>>> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>>>>
>>>> I tried optimizing this by adding an index:
>>>>     "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
>>>> not much improvement.
>>>>
>>>> EXPLAIN ANALYZE still looked bad:
>>>>
>>>> ---
>>>>  Index Scan using martin_mailbox on mailbox_messages mm
>>>> (cost=0.00..243326004.21 rows=386398 width=16) (actual
>>>> time=2.161..1050004.868 rows=397336 loops=1)
>>>>    Index Cond: (mailbox = 441)
>>>>    Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
>>>> 2.8MB here}'))
>>>>  Total runtime: 1050230.858 ms  <- SEE THIS?
>>>> ---
>>>>
>>>> After an optimization I found mentioned here:
>>>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
>>>> I changed the query to look like this:
>>>>
>>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>>> mm.mailbox=441 and not mm.deleted and
>>>> mm.uid=any(VALUES(16480),(16481)..)"
>>>>
>>>> Apparently the VALUES trick builds a temp table which make the query a
>>>> lot easier to digest for Postgres.
>>>>
>>>> I got much better times (236x faster!):
>>>>
>>>> ---
>>>> Nested Loop  (cost=6621.38..6771.33 rows=194822 width=16) (actual
>>>> time=616.886..4312.135 rows=397336 loops=1)
>>>>    ->  HashAggregate  (cost=6621.38..6623.38 rows=200 width=4) (actual
>>>> time=616.693..1010.668 rows=441425 loops=1)
>>>>          ->  Values Scan on "*VALUES*"  (cost=0.00..5517.81
>>>> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>>>>    ->  Index Scan using martin_mailbox_deleted on mailbox_messages mm 
>>>> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
>>>> loops=441425)
>>>>          Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>>>>  Total runtime: 4446.153 ms
>>>> ---
>>>>
>>>> I think this easy optimization will do AOX overall good, I see similar
>>>> queries all the time at our server, averaging at about 2000ms.
>>>>
>>>> A patch would be very much welcome :-)
>>>>
>>>>

Re: certain aox queries run wild and stay forever

Reply via email to