Assuming that the threshold
for switching to an indexscan is somewhere around selectivity 0.005
(I am not certain offhand, but it's in that general area), this cannot
possibly require more than 200 MCV slots, and for most data
distributions it'd be a whole lot less.
Thats a really good
Nathan Boley [EMAIL PROTECTED] writes:
Given that, wouldn't it be smarter to consider a value as an mcv
candidate iff it has a density greater than 0.005, rather than having
a count greater than 1.5*average?
Yeah, perhaps ... want to experiment with that? Though I'd be a bit
worried about how
Tom Lane [EMAIL PROTECTED] writes:
Gregory Stark [EMAIL PROTECTED] writes:
It's possible that the second option I described -- teaching Append when to
use something other than sum() -- would only work in the cases where
constraint exclusion could be fixed though. In which case having
Obviously we run into problems when
a) we have a poor estimate for ndistinct - but then we have
worse problems
b) our length measure doesn't correspond well with ndistinct
in an interval
One more problem with low ndistinct values is that the condition might very well
hit no rows at all. But
Zeugswetter Andreas OSB sIT [EMAIL PROTECTED] writes:
I think for low ndistinct values we will want to know the exact
value + counts and not a bin. So I think we will want additional stats rows
that represent value 'a1' stats.
Isn't that what our most frequent values list does?
--
Gregory
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
look at past queries
and keep track of how often they return
On Tue, 2008-06-10 at 08:51 -0700, Nathan Boley wrote:
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
One more problem with low ndistinct values is that the condition might
very well
hit no rows at all. But Idea 1 will largely overestimate the number of
hits.
Thats a good point, but I don't see a clear solution. Maybe we could
I think that MCVs are the solution, right?
Only if
Nathan Boley [EMAIL PROTECTED] writes:
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed. I
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed.
Why? What if ( somehow ) we knew that
Nathan Boley [EMAIL PROTECTED] writes:
If we query on values that aren't in the table, the planner will
always overestimate the expected number of returned rows because it (
implicitly ) assumes that every query will return at least 1 record.
That's intentional and should not be changed.
Why? What if ( somehow ) we knew that there was a 90% chance that
query would return an empty result set on a big table with 20 non-mcv
distinct values. Currently the planner would always choose a seq scan,
where an index scan might be better.
(1) On what grounds do you assert the above?
Nathan Boley [EMAIL PROTECTED] writes:
(1) On what grounds do you assert the above?
For a table with 100 non-mcv rows, the planner estimates a result
set of cardinality 100/20 = 5, not 1.
The real problem in that situation is that you need another twenty slots
in the MCV list.
Tom Lane [EMAIL PROTECTED] writes:
(In fact, I don't think the plan would change, in this case. The reason
for the clamp to 1 row is to avoid foolish results for join situations.)
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude
Gregory Stark [EMAIL PROTECTED] writes:
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude the irrelevant partitions. You're going
to get 0 rows from all but the one partition which contains the 1 row you're
looking for. But since
Tom Lane [EMAIL PROTECTED] writes:
Gregory Stark [EMAIL PROTECTED] writes:
The screw case I've seen is when you have a large partitioned table where
constraint_exclusion fails to exclude the irrelevant partitions. You're going
to get 0 rows from all but the one partition which contains the 1
Gregory Stark [EMAIL PROTECTED] writes:
It's possible that the second option I described -- teaching Append when to
use something other than sum() -- would only work in the cases where
constraint exclusion could be fixed though. In which case having fractional
row counts might actually be
Tom Lane wrote:
This gets back to the discussions at PGCon about needing to have a more
explicit representation of partitioning. Right now, for a
many-partition table we spend huge amounts of time deriving the expected
behavior from first principles, each time we make a plan. And even then
On Sun, 2008-06-08 at 19:03 -0400, Tom Lane wrote:
Your argument seems to consider only columns having a normal
distribution. How badly does it fall apart for non-normal
distributions? (For instance, Zipfian distributions seem to be pretty
common in database work, from what I've seen.)
If
Your argument seems to consider only columns having a normal
distribution.
My example was based upon normally distributed data because people
usually know what they are and they are reasonably common.
How badly does it fall apart for non-normal distributions?
This should work to the extent
Nathan Boley [EMAIL PROTECTED] writes:
... There are two potential problems that I see with this approach:
1) It assumes the = is equivalent to = and = . This is certainly
true for real numbers, but is it true for every equality relation that
eqsel predicts for?
The cases that
21 matches
Mail list logo