Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Feike Steenbergen
Sorry, didn't respond to all your questions:

 What version of PostgreSQL are running?  'select version();'

PostgreSQL 9.3.4 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

 What do you get when to do analyze verbose print_list?

# analyze verbose print_list ;
INFO:  analyzing print_list
INFO:  print_list: scanned 53712 of 53712 pages, containing 3626950
live rows and 170090 dead rows; 30 rows in sample, 3626950
estimated total rows
ANALYZE
Time: 6656.037 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Feike Steenbergen
On 25 March 2015 at 22:45, Jeff Janes jeff.ja...@gmail.com wrote:

 How can the avg_width be 4 when the vast majority of entries are 7
 characters long?

The datatype is an enum, as I understand it, an enum type always
occupies 4 bytes


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Tomas Vondra
On 26.3.2015 08:48, Jeff Janes wrote:

 OK, this is starting to look like a long-standing bug to me.
 
 If it only sees 3 distinct values, and all three are present at least
 twice, it throws all of them into the MCV list. But if one of those 3
 were present just once, then it tests them to see if they qualify.
 The test for inclusion is that it has to be present more than once,
 and that it must be over-represented by 25%.
 
 Lets say it sampled 3 rows and found 29,900 of one value, 99 of
 another, and 1 of a third.
 
 But that turns into the second one needing to be present 12,500 times. 
 The average value is present 10,000 times (30,000 samples with 3
 distinct values) and 25 more than that is 12,500.  So it excluded.
 
 It seems to me that a more reasonable criteria is that it must be
 over-represented 25% compared to the average of all the remaining values
 not yet accepted into the MCV list.  I.e. all the greater ones should be
 subtracted out before computing the over-representation threshold.

That might work IMO, but maybe we should increase the coefficient a bit
(say, from 1.25 to 2), not to produce needlessly long MCV lists.


 It is also grossly inconsistent with the other behavior.  If they are
 29900; 98; 2 then all three go into the MCV.

Isn't the mincount still 12500? How could all three get into the MCV?


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Jeff Janes
On Thu, Mar 26, 2015 at 5:44 AM, Tomas Vondra tomas.von...@2ndquadrant.com
wrote:

 On 26.3.2015 08:48, Jeff Janes wrote:
 
  OK, this is starting to look like a long-standing bug to me.
 
  If it only sees 3 distinct values, and all three are present at least
  twice, it throws all of them into the MCV list. But if one of those 3
  were present just once, then it tests them to see if they qualify.
  The test for inclusion is that it has to be present more than once,
  and that it must be over-represented by 25%.
 
  Lets say it sampled 3 rows and found 29,900 of one value, 99 of
  another, and 1 of a third.
 
  But that turns into the second one needing to be present 12,500 times.
  The average value is present 10,000 times (30,000 samples with 3
  distinct values) and 25 more than that is 12,500.  So it excluded.
 
  It seems to me that a more reasonable criteria is that it must be
  over-represented 25% compared to the average of all the remaining values
  not yet accepted into the MCV list.  I.e. all the greater ones should be
  subtracted out before computing the over-representation threshold.

 That might work IMO, but maybe we should increase the coefficient a bit
 (say, from 1.25 to 2), not to produce needlessly long MCV lists.


That wouldn't work here, because at the point of decision the value present
99 times contributes half the average, so the average is 50, and of course
it can't possibly be twice of that.

I have a patch, but is there a way to determine how it affects a wide
variety of situations?  I guess run `make installcheck`, then analyze, then
dump pg_stats, with the patch and without the patch, and then compare the
dumpsj?




  It is also grossly inconsistent with the other behavior.  If they are
  29900; 98; 2 then all three go into the MCV.

 Isn't the mincount still 12500? How could all three get into the MCV?


If all observed values are observed at least twice, it takes a different
path through the code.  It just keeps them all in the MCV list. That is
what is causing the instability for the OP.  If the 3rd most common is seen
twice, then all three are kept.  If it is seen once, then only the most
common is kept.  See if statements at 2494 and 2585

else if (toowide_cnt == 0  nmultiple == ndistinct)

if (track_cnt == ndistinct 

Cheers,

Jeff


analyze_highly_skewed.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Tomas Vondra
On 26.3.2015 17:35, Jeff Janes wrote:
 On Thu, Mar 26, 2015 at 5:44 AM, Tomas Vondra
 tomas.von...@2ndquadrant.com mailto:tomas.von...@2ndquadrant.com wrote:
 
 That might work IMO, but maybe we should increase the coefficient a
 bit (say, from 1.25 to 2), not to produce needlessly long MCV
 lists.
 
 That wouldn't work here, because at the point of decision the value 
 present 99 times contributes half the average, so the average is 50, 
 and of course it can't possibly be twice of that.

Oh, right. How could I miss that? ;-)

 I have a patch, but is there a way to determine how it affects a
 wide variety of situations? I guess run `make installcheck`, then
 analyze, then dump pg_stats, with the patch and without the patch,
 and then compare the dumpsj?

I doubt there's such way. I'd argue that if you can show this always
generates longer MCV lists, we can assume the stats are probably more
accurate, and thus the plans should be better.

Of course, there's always the possibility that the plan was good by
luck, and improving the estimates will result in a worse plan. But I
don't think we can really fix that :-(

 It is also grossly inconsistent with the other behavior. If they
 are 29900; 98; 2 then all three go into the MCV.
 
 Isn't the mincount still 12500? How could all three get into the
 MCV?
 
 If all observed values are observed at least twice, it takes a 
 different path through the code. It just keeps them all in the MCV 
 list. That is what is causing the instability for the OP. If the 3rd 
 most common is seen twice, then all three are kept. If it is seen 
 once, then only the most common is kept. See if statements at 2494 
 and 2585
 
 else if (toowide_cnt == 0  nmultiple == ndistinct)
 
 if (track_cnt == ndistinct 

Aha, I think I see it now. I've been concentrating on this code:

   avgcount = (double) samplerows / ndistinct;
   /* set minimum threshold count to store a value */
   mincount = avgcount * 1.25;
   if (mincount  2)
  mincount = 2;

but this is actually too late, because first we do this:

  else if (toowide_cnt == 0  nmultiple == ndistinct)
  {
 stats-stadistinct = ndistinct;
  }

and that only happens if each item is observed at least 2x in the sample
(and the actual Haas and Stokes estimator it not used).

And then we do this:

  if (track_cnt == ndistinct  toowide_cnt == 0 
  stats-stadistinct  0  track_cnt = num_mcv)
  {
num_mcv = track_cnt;
  }

so that we track everything.

If at least one value is seen only 1x, it works differently, and we use
the code with (1.25*avgcount) threshold.

I wonder where the 1.25x threshold comes from - whether it's something
we came up with, or if it comes from some paper. I guess the former.


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-26 Thread Jeff Janes
On Wed, Mar 25, 2015 at 1:00 PM, Feike Steenbergen 
feikesteenber...@gmail.com wrote:

 On 25 March 2015 at 19:07, Jeff Janes jeff.ja...@gmail.com wrote:

  Also, I doubt that that is the problem in the first place.  If you
 collect a
  sample of 30,000 (which the default target size of 100 does), and the
  frequency of the second most common is really 0.00307333 at the time you
  sampled it, you would expect to find it 92 times in the sample. The
 chances
  against actually finding 1 instead of around 92 due to sampling error are
  astronomical.

 It can be that the distribution of values is very volatile; we hope
 the increased stats target (from the default=100 to 1000 for this
 column) and frequent autovacuum and autoanalyze helps in keeping the
 estimates correct.

 It seems that it did find some other records ( 'PRINTED), as is
 demonstrated in the stats where there was only one value in the MCV
 list: the frequency was 0.996567 and the fraction of nulls was 0,
 therefore leaving 0.03+ for other values. But because none of them
 were in the MCV and MCF list, they were all treated as equals. They
 are certainly not equal.

 I not know why some values were found (they are mentioned in the
 histogram_bounds), but are not part of the MCV list, as you say, the
 likeliness of only 1 item being found is very small.

 Does anyone know the criteria for a value to be included in the MCV list?


OK, this is starting to look like a long-standing bug to me.

If it only sees 3 distinct values, and all three are present at least
twice, it throws
all of them into the MCV list.  But if one of those 3 were present just
once, then it
tests them to see if they qualify.  The test for inclusion is that it has
to be present more than once, and that it must be over-represented by 25%.

Lets say it sampled 3 rows and found 29,900 of one value, 99 of
another, and 1 of a third.

But that turns into the second one needing to be present 12,500 times.  The
average value is present 10,000 times (30,000 samples with 3 distinct
values) and 25 more than that is 12,500.  So it excluded.

It seems to me that a more reasonable criteria is that it must be
over-represented 25% compared to the average of all the remaining values
not yet accepted into the MCV list.  I.e. all the greater ones should be
subtracted out before computing the over-representation threshold.

It is also grossly inconsistent with the other behavior.  If they are
29900; 98; 2 then all three go into the MCV.
If they are 29900; 99; 1 then only the highest one goes in.  The second
one gets evicted for being slightly *more* popular.

This is around line 2605 of src/backend/commands/analyze.c in head.

Cheers,

Jeff


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Feike Steenbergen
Hi, thanks for having a look and thinking with us

On 25 March 2015 at 13:45, Tomas Vondra tomas.von...@2ndquadrant.com wrote:
 Can you post results for this query?

 SELECT stats, COUNT(*) FROM print_list group by 1

   status   |  count
+-
 ERROR  | 159
 PREPARED   |   10162
 PRINTED| 3551367
 TO_BE_PREPARED |   2
(4 rows)

 We can also increase the 'Stats target' for this table, which will
 cause the statistics to contain information about 'NOT_YET_PRINTED'
 more often, but even then, it may not find any of these records, as
 they sometimes do not exist.

 This is a better solution, IMHO.

We'll have a go at this, also if what you say about values having to
appear at least twice, the other values may make it into
most_common_*, which would make it clearer to us.

We're a bit hesitant to decrease random_page_cost (currently 3 in this
cluster) as a lot more is happening on this database.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Tomas Vondra
On 25.3.2015 13:04, Feike Steenbergen wrote:
...
 When analyzing pg_stats we have sometimes have the following: (Note:
 'NOT_YET_PRINTED' has not been found during this analyze, these are
 real values)
 
  attname| status
  inherited  | f
  null_frac  | 0
  avg_width  | 4
  n_distinct | 3
  most_common_vals   | {PRINTED}
  most_common_freqs  | {0.996567}
  histogram_bounds   | {PREPARED,ERROR}
  correlation| 0.980644
 
 A question about this specific entry, which some of you may be able to
 shed some light on:
 
 most_common_vals contains only 1 entry, why is this? I would expect to
 see 3 entries, as it has n_distinct=3

To be included in the MCV list, the value has to actually appear in the
random sample at least twice, IIRC. If the values are very rare (e.g. if
you only have such 10 rows out of 3.5M), that may not happen.

You may try increasing the statistics target for this column, which
should make the sample larger and stats more detailed (max is 1,
which should use sample ~3M rows, i.e. almost the whole table).

 When looking at
 http://www.postgresql.org/docs/current/static/row-estimation-examples.html
 we can see that an estimate  5000 is what is to be expected for these
 statistics:
 
 # select ( (1 - 0.996567)/2 * 350 )::int;
  int4
 --
  6008
 (1 row)
 
 But why does it not record the frequency of 'PREPARED' and 'ERROR'
 in most_common_*?

Can you post results for this query?

SELECT stats, COUNT(*) FROM print_list group by 1

I'd like to know how frequent the other values are.

 
 Our current strategies in mitigating this problem is decreasing the 
 autovacuum_*_scale_factor for this specific table, therefore 
 triggering more analyses and vacuums.

I'm not sure this is a good solution. The problem is elsewhere, IMHO.

 This is helping somewhat, as if the problem occurs it often solved 
 automatically if autoanalyze analyzes this table, it is analyzed
 many times an hour currently.
 
 We can also increase the 'Stats target' for this table, which will
 cause the statistics to contain information about 'NOT_YET_PRINTED'
 more often, but even then, it may not find any of these records, as
 they sometimes do not exist.

This is a better solution, IMHO.

 
 Could you help us to find a strategy to troubleshoot this issue
 further?

You might also make the index scans cheaper, so that the switch to
sequential scan happens later (when more rows are estimated). Try to
decreasing random_page_cost from 4 (default) to 1.5 or something like that.

It may hurt other queries, though, depending on the dataset size etc.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Feike Steenbergen
I'm posting this as I am trying to understand what has happened.
TLDR: The problem seems to be fixed now.

By bumping the statistics_target we see that most_common_vals is
having its contents filled more often, causing way better estimates:

 attname| status
 inherited  | f
 null_frac  | 0
 avg_width  | 4
 n_distinct | 3
 most_common_vals   | {PRINTED,PREPARED,ERROR}
 most_common_freqs  | {0.996863,0.00307333,6.3e-05}
 histogram_bounds   | (null)
 correlation| 0.98207
 most_common_elems  | (null)
 most_common_elem_freqs | (null)
 elem_count_histogram   | (null)

Basically 100% of the records are accounted for in these statistics,
the planner now consistently estimates the number of rows to be very
small for other values.

Before bumping the target we didn't have information for 0.34% of the
rows, which in this case means roughly 11K rows.

What is the reasoning behind having at least 2 hits before including
it in the most_common_* columns?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Tom Lane
Feike Steenbergen feikesteenber...@gmail.com writes:
 On 25 March 2015 at 13:45, Tomas Vondra tomas.von...@2ndquadrant.com wrote:
 We can also increase the 'Stats target' for this table, which will
 cause the statistics to contain information about 'NOT_YET_PRINTED'
 more often, but even then, it may not find any of these records, as
 they sometimes do not exist.

 This is a better solution, IMHO.

 We'll have a go at this, also if what you say about values having to
 appear at least twice, the other values may make it into
 most_common_*, which would make it clearer to us.

In principle increasing the stats target should fix this, whether or not
'NOT_YET_PRINTED' appears in the MCV list after any particular analyze;
because what will happen is that the frequency for 'PRINTED' will more
nearly approach 1, and so the estimated selectivity for other values
will drop even if they're not in the list.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Jeff Janes
On Wed, Mar 25, 2015 at 9:07 AM, Feike Steenbergen 
feikesteenber...@gmail.com wrote:

 I'm posting this as I am trying to understand what has happened.
 TLDR: The problem seems to be fixed now.

 By bumping the statistics_target we see that most_common_vals is
 having its contents filled more often, causing way better estimates:

  attname| status
  inherited  | f
  null_frac  | 0
  avg_width  | 4
  n_distinct | 3
  most_common_vals   | {PRINTED,PREPARED,ERROR}
  most_common_freqs  | {0.996863,0.00307333,6.3e-05}
  histogram_bounds   | (null)
  correlation| 0.98207
  most_common_elems  | (null)
  most_common_elem_freqs | (null)
  elem_count_histogram   | (null)

 Basically 100% of the records are accounted for in these statistics,
 the planner now consistently estimates the number of rows to be very
 small for other values.

 Before bumping the target we didn't have information for 0.34% of the
 rows, which in this case means roughly 11K rows.

 What is the reasoning behind having at least 2 hits before including
 it in the most_common_* columns?


If you sample a small portion of the table, then anything only present once
is going to be have a huge uncertainty on its estimate.

Consider the consequences of including things sampled once.  100% of the
rows that got sampled will be in the sample at least once.  That means
most_common_freqs will always sum to 100%.   Which means we are declaring
that anything not observed in the sample has a frequency of
0.%, which is clearly beyond what we have any reasonable
evidence to support.

Also, I doubt that that is the problem in the first place.  If you collect
a sample of 30,000 (which the default target size of 100 does), and the
frequency of the second most common is really 0.00307333 at the time you
sampled it, you would expect to find it 92 times in the sample. The chances
against actually finding 1 instead of around 92 due to sampling error are
astronomical.

The problem seems to be rapidly changing stats, not too small of a target
size (unless your original target size was way below the current default
value, forgive me if you already reported that, I didn't see it anywhere).

If you analyze the table at a point when it is 100% PRINTED, there is no
way of knowing based on that analysis alone what the distribution of
!='PRINTED' would be, should such values ever arise.

Maybe it would work better if you built the partial index where status =
'NOT_YET_PRINTED', instead of !='PRINTED'.

Cheers,

Jeff


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Feike Steenbergen
On 25 March 2015 at 19:07, Jeff Janes jeff.ja...@gmail.com wrote:

 Also, I doubt that that is the problem in the first place.  If you collect a
 sample of 30,000 (which the default target size of 100 does), and the
 frequency of the second most common is really 0.00307333 at the time you
 sampled it, you would expect to find it 92 times in the sample. The chances
 against actually finding 1 instead of around 92 due to sampling error are
 astronomical.

It can be that the distribution of values is very volatile; we hope
the increased stats target (from the default=100 to 1000 for this
column) and frequent autovacuum and autoanalyze helps in keeping the
estimates correct.

It seems that it did find some other records ( 'PRINTED), as is
demonstrated in the stats where there was only one value in the MCV
list: the frequency was 0.996567 and the fraction of nulls was 0,
therefore leaving 0.03+ for other values. But because none of them
were in the MCV and MCF list, they were all treated as equals. They
are certainly not equal.

I not know why some values were found (they are mentioned in the
histogram_bounds), but are not part of the MCV list, as you say, the
likeliness of only 1 item being found is very small.

Does anyone know the criteria for a value to be included in the MCV list?

 The problem seems to be rapidly changing stats, not too small of a target
 size (unless your original target size was way below the current default
 value, forgive me if you already reported that, I didn't see it anywhere).
 Maybe it would work better if you built the partial index where status =
 'NOT_YET_PRINTED', instead of !='PRINTED'.

Thanks, we did create a partial index on 'NOT_YET_PRINTED' today to
help aiding these kind of queries.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Jeff Janes
On Wed, Mar 25, 2015 at 1:00 PM, Feike Steenbergen 
feikesteenber...@gmail.com wrote:

 On 25 March 2015 at 19:07, Jeff Janes jeff.ja...@gmail.com wrote:

  Also, I doubt that that is the problem in the first place.  If you
 collect a
  sample of 30,000 (which the default target size of 100 does), and the
  frequency of the second most common is really 0.00307333 at the time you
  sampled it, you would expect to find it 92 times in the sample. The
 chances
  against actually finding 1 instead of around 92 due to sampling error are
  astronomical.

 It can be that the distribution of values is very volatile; we hope
 the increased stats target (from the default=100 to 1000 for this
 column) and frequent autovacuum and autoanalyze helps in keeping the
 estimates correct.

 It seems that it did find some other records ( 'PRINTED), as is
 demonstrated in the stats where there was only one value in the MCV
 list: the frequency was 0.996567 and the fraction of nulls was 0,
 therefore leaving 0.03+ for other values. But because none of them
 were in the MCV and MCF list, they were all treated as equals. They
 are certainly not equal.


Now that I look back at the first post you made, it certainly looks like
the statistics target was set to 1 when that was analyzed, not to 100.  But
it doesn't look quite correct for that, either.

What version of PostgreSQL are running?  'select version();'

What do you get when to do analyze verbose print_list?

How can the avg_width be 4 when the vast majority of entries are 7
characters long?

Cheers,

Jeff


[PERFORM] Index only scan sometimes switches to sequential scan for small amount of rows

2015-03-25 Thread Feike Steenbergen
Hi,

Situation:

We have a table with 3,500,000+ rows, which contain items that need to
be printed or have been printed previously.

Most of these records have a status of 'PRINTED', we have a partial
index on this table WHERE status  'PRINTED'.
During normal operation there will be  10 records matching 'NOT_YET_PRINTED'.
When using the index scan this is done in  5ms, but when the
sequential scan is involved the query runs  500ms.


We query this table often in the form:

SELECT *
  FROM print_list
  JOIN [...]
  JOIN [...]
 WHERE stats = 'NOT_YET_PRINTED'
 LIMIT 8;

This query is currently switching between a sequential scan on the
print_list table and an index scan on the previously mentioned index.

When doing an explain analyze on the queries we see that it sometimes
expects to return  5000 records when in reality it is only  5
records that are returned, for example:

   -  Index Scan using print_list_status_idx on print_list
(cost=0.27..1138.53 rows=6073 width=56) (actual time=0.727..0.727
rows=0 loops=1)

Sometimes, this results in the planner choosing a sequential scan for
this query.

When analyzing pg_stats we have sometimes have the following: (Note:
'NOT_YET_PRINTED' has not been found during this analyze, these are
real values)

 attname| status
 inherited  | f
 null_frac  | 0
 avg_width  | 4
 n_distinct | 3
 most_common_vals   | {PRINTED}
 most_common_freqs  | {0.996567}
 histogram_bounds   | {PREPARED,ERROR}
 correlation| 0.980644

A question about this specific entry, which some of you may be able to
shed some light on:

most_common_vals contains only 1 entry, why is this? I would expect to
see 3 entries, as it has n_distinct=3

When looking at
http://www.postgresql.org/docs/current/static/row-estimation-examples.html
we can see that an estimate  5000 is what is to be expected for these
statistics:

# select ( (1 - 0.996567)/2 * 350 )::int;
 int4
--
 6008
(1 row)

But why does it not record the frequency of 'PREPARED' and 'ERROR' in
most_common_*?

Our current strategies in mitigating this problem is decreasing the
autovacuum_*_scale_factor for this specific table, therefore
triggering more analyses and vacuums.

This is helping somewhat, as if the problem occurs it often solved
automatically if autoanalyze analyzes this table, it is analyzed many
times an hour currently.

We can also increase the 'Stats target' for this table, which will
cause the statistics to contain information about 'NOT_YET_PRINTED'
more often, but even then, it may not find any of these records, as
they sometimes do not exist.

Could you help us to find a strategy to troubleshoot this issue further?

Some specific questions:
- We can see it is doing a sequential scan of the full table (3.5mio
records) even when it only expects 8000 records to be returned, we
would expect this not to happen so soon.
- Why is most_common_* not filled when there are only 3 distinct values?

Feike Steenbergen


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance