Re: [HACKERS] GIN improvements part2: fast scan

Tomas Vondra Sun, 26 Jan 2014 10:21:51 -0800

On 26.1.2014 17:14, Heikki Linnakangas wrote:
> 
> I would actually expect it to be fairly effective for that query, so
> that's a bit surprising. I added counters to see where the calls are
> coming from, and it seems that about 80% of the calls are actually
> coming from this little the feature I explained earlier:
> 
>> In addition to that, I'm using the ternary consistent function to check
>> if minItem is a match, even if we haven't loaded all the entries yet.
>> That's less important, but I think for something like "rare1 | (rare2 &
>> frequent)" it might be useful. It would allow us to skip fetching
>> 'frequent', when we already know that 'rare1' matches for the current
>> item. I'm not sure if that's worth the cycles, but it seemed like an
>> obvious thing to do, now that we have the ternary consistent function.
> 
> So, that clearly isn't worth the cycles :-). At least not with an
> expensive consistent function; it might be worthwhile if we pre-build
> the truth-table, or cache the results of the consistent function.
> 
> Attached is a quick patch to remove that, on top of all the other
> patches, if you want to test the effect.


Indeed, the patch significantly improved the performance. The total
runtime is almost exactly the same as on 9.3 (~22 seconds for 1000
queries). The timing chart (patched vs. 9.3) is attached.

A table with number of queries with duration ratio below some threshold
looks like this:

  threshold |  count | percentage
-------------------------------------
   0.5      |      3 |       0.3%
   0.75     |     45 |       4.5%
   0.9      |    224 |      22.4%
   1.0      |    667 |      66.7%
   1.05     |    950 |      95.0%
   1.1      |    992 |      99.2%

A ratio is just a measure of how much time it took compared to 9.3

    ratio = (duration on patched HEAD) / (duration on 9.3)

The table is cumulative, e.g. values in the 0.9 row mean that for 224
queries the duration with the patches was below 90% of the duration on 9.3.

IMHO the table suggests with the last patch we're fine - majority of
queries (~66%) is faster than on 9.3, and the tail is very short. There
are just 2 queries that took more than 15% longer, compared to 9.3. And
we're talking about 20ms vs. 30ms, so chances are this is just a random
noise.

So IMHO we can go ahead, and maybe tune this a bit more in the future.

regards
Tomas

<<attachment: gin_query_durations_fixed.png>>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

Reply via email to