Managed to squeeze this into a server for a quick test and the binding
causes a problem, so this is not going to work as-is. I'll take another
look but the patch will be more complex and take a bit more time.
Jim
On 09/05/2015 16:35, NSS Ltd wrote:
> Oops, just checked and I think default values are in the declaration and
> not the implementation (I suspect gcc probably fixes it anyway). So
> patch for IntegerSet should be :
>
> --- ../archiveopteryx-3.2.0/server/integerset.h 2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.h 2015-05-09 11:32:54.719422000 -0400
> @@ -27,7 +27,7 @@
> uint index( uint ) const;
>
> EString set() const;
> - EString csl() const;
> + EString csl(bool asvalues = false) const;
>
> void add( uint, uint );
> void add( uint n ) { add( n, n ); }
>
> --- ../archiveopteryx-3.2.0/server/integerset.cpp 2014-03-10
> 09:44:59.000000000 -0400
> +++ server/integerset.cpp 2015-05-09 11:32:45.027422000 -0400
> @@ -542,7 +542,7 @@
> decimal numbers.
> */
>
> -EString IntegerSet::csl() const
> +EString IntegerSet::csl(bool asvalues) const
> {
> EString r;
> r.reserve( 2222 );
> @@ -558,7 +558,13 @@
> if ( b & ( 1 << j ) ) {
> if ( !r.isEmpty() )
> r.append( ',' );
> + if (asvalues)
> + r.append( "VALUES(");
> +
> r.appendNumber( it->start + n * BitsPerUint + j );
> +
> + if (asvalues)
> + r.append( ')');
> }
> j++;
> }
>
>
> On 09/05/2015 16:22, NSS Ltd wrote:
>> OK, this is *untested* but my thoughts are below. It should be
>> minimally disruptive to the source code but because it adds a parameter
>> with a default (for the existing behaviour) to IntegerSet::csl, it does
>> mean it touches most files during a compile.
>>
>> I need to get something I can test this in as I already have changes in
>> my development tree to query.cpp which are not complete, so I cannot use
>> my current test environment until I fork what I have.
>>
>> If you have a test environment you can run this in, please let me know
>> as otherwise it will take me a few days.
>>
>> Happy to discuss if that helps.
>>
>> Jim
>>
>>
>> --- ../archiveopteryx-3.2.0/db/query.cpp 2014-03-10
>> 09:44:59.000000000 -0400
>> +++ db/query.cpp 2015-05-09 11:07:44.619422000 -0400
>> @@ -317,9 +317,10 @@
>> void Query::bind( uint n, const class IntegerSet & set )
>> {
>> if ( d->format == Text ) {
>> - EString s( "{" );
>> - s.append( set.csl() );
>> - s.append( "}" );
>> + // PostgreSQL Is More Efficient By Using A
>> VALUES(x),VALUES(y)...VALUES(z)
>> + // List Instead Of An Array List "{x,y,...,z}"
>> + // See
>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
>> + EString s( "" );
>> + s.append( set.csl(true) );
>> bind( n, s );
>> }
>> else {
>> @@ -855,6 +856,19 @@
>> return c->bi;
>> }
>>
>> --- ../archiveopteryx-3.2.0/server/integerset.cpp 2014-03-10
>> 09:44:59.000000000 -0400
>> +++ server/integerset.cpp 2015-05-09 11:08:33.355422000 -0400
>> @@ -542,7 +542,7 @@
>> decimal numbers.
>> */
>>
>> -EString IntegerSet::csl() const
>> +EString IntegerSet::csl(bool asvalues=false) const
>> {
>> EString r;
>> r.reserve( 2222 );
>> @@ -558,7 +558,13 @@
>> if ( b & ( 1 << j ) ) {
>> if ( !r.isEmpty() )
>> r.append( ',' );
>> + if (asvalues)
>> + r.append( "VALUES(");
>> +
>> r.appendNumber( it->start + n * BitsPerUint + j );
>> +
>> + if (asvalues)
>> + r.append( ')');
>> }
>> j++;
>> }
>>
>> --- ../archiveopteryx-3.2.0/server/integerset.h 2014-03-10
>> 09:44:59.000000000 -0400
>> +++ server/integerset.h 2015-05-09 11:07:08.567422000 -0400
>> @@ -27,7 +27,7 @@
>> uint index( uint ) const;
>>
>> EString set() const;
>> - EString csl() const;
>> + EString csl(bool) const;
>>
>> void add( uint, uint );
>> void add( uint n ) { add( n, n ); }
>>
>>
>> On 09/05/2015 15:58, NSS Ltd wrote:
>>> Arnt,
>>>
>>> >From a quick look, these queries use the IntegerSet bind from
>>> db/query.cpp; if you've not made a patch already, I can probably roll
>>> something fairly quickly to test this with as it looks like it could be
>>> done as a self-contained patch without affecting anything else.
>>>
>>> Let me know.
>>>
>>> Jim
>>>
>>>
>>> On 09/05/2015 14:50, Martin Rode wrote:
>>>> Hi Arnt & all
>>>>
>>>> I digged into this a little further. Here are my oberservations:
>>>>
>>>> The query in question contains a 2.8M =ANY(ARRAY[....]) clause, it
>>>> takes here about 10m to complete. Result Set is about 360k rows.
>>>>
>>>> We are still on Postgresql 9.1. (Debian Wheezy), may be that is the
>>>> reason I reckoned. But after setting some postgresql.conf settings
>>>> (work_mem & friends, thx Axel) I still got bad query times.
>>>>
>>>> Query looks like this:
>>>>
>>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>>> mm.mailbox=441 and not mm.deleted and mm.uid=any('{...}')".
>>>>
>>>> I tried optimizing this by adding an index:
>>>> "martin_mailbox_deleted" btree (mailbox, uid) WHERE NOT deleted
>>>> not much improvement.
>>>>
>>>> EXPLAIN ANALYZE still looked bad:
>>>>
>>>> ---
>>>> Index Scan using martin_mailbox on mailbox_messages mm
>>>> (cost=0.00..243326004.21 rows=386398 width=16) (actual
>>>> time=2.161..1050004.868 rows=397336 loops=1)
>>>> Index Cond: (mailbox = 441)
>>>> Filter: ((NOT deleted) AND (uid = ANY ('{16480,16481,... add your
>>>> 2.8MB here}'))
>>>> Total runtime: 1050230.858 ms <- SEE THIS?
>>>> ---
>>>>
>>>> After an optimization I found mentioned here:
>>>> https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/,
>>>> I changed the query to look like this:
>>>>
>>>> "select mm.uid, mm.modseq, mm.message from mailbox_messages mm where
>>>> mm.mailbox=441 and not mm.deleted and
>>>> mm.uid=any(VALUES(16480),(16481)..)"
>>>>
>>>> Apparently the VALUES trick builds a temp table which make the query a
>>>> lot easier to digest for Postgres.
>>>>
>>>> I got much better times (236x faster!):
>>>>
>>>> ---
>>>> Nested Loop (cost=6621.38..6771.33 rows=194822 width=16) (actual
>>>> time=616.886..4312.135 rows=397336 loops=1)
>>>> -> HashAggregate (cost=6621.38..6623.38 rows=200 width=4) (actual
>>>> time=616.693..1010.668 rows=441425 loops=1)
>>>> -> Values Scan on "*VALUES*" (cost=0.00..5517.81
>>>> rows=441425 width=4) (actual time=0.006..214.002 rows=441425 loops=1)
>>>> -> Index Scan using martin_mailbox_deleted on mailbox_messages mm
>>>> (cost=0.00..0.69 rows=4 width=16) (actual time=0.006..0.006 rows=1
>>>> loops=441425)
>>>> Index Cond: ((mailbox = 441) AND (uid = "*VALUES*".column1))
>>>> Total runtime: 4446.153 ms
>>>> ---
>>>>
>>>> I think this easy optimization will do AOX overall good, I see similar
>>>> queries all the time at our server, averaging at about 2000ms.
>>>>
>>>> A patch would be very much welcome :-)
>>>>
>>>>