Say42 wrote:
Perhaps I am wrong but I assume normalization is a usual case, small
master (parent) tables are not very rare also.
Yes, my example is unusual but it is _real_ and demonstrate PG
optimizer inaccuracy. Why don't we make PG optimizer more close to
reality if we can? Is it so
Peter Eisentraut wrote:
But you haven't offered any analysis about the cause of this problem, so any
speculation about normalization, usual cases, caching effects and so on are
unfounded and premature.
Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain
Say42 [EMAIL PROTECTED] writes:
... Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):
select count(*) from conn.conn20060803 c where
exists (select code from belg_mobile tc
where c.bnum = tc.code and c.bnum like tc.code || '%'
Simon Riggs wrote:
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
That's what I want to do:
1. Replace not very useful indexCorrelation with indexClustering.
An opinion such as not very useful isn't considered sufficient
explanation or justification for a change around here.
Not
Ron Mayer [EMAIL PROTECTED] writes:
It's common here for queries to vastly overestimate the
number of pages that would need to be read because
postgresql's guess at the correlation being practically 0
despite the fact that the distinct values for any given
column are closely packed on a few
On Sep 13, 2006, at 14:44 , Gregory Stark wrote:
I think we need a serious statistics jock to pipe up with some
standard
metrics that do what we need. Otherwise we'll never have a solid
footing for
the predictions we make and will never know how much we can trust
them.
That said I'm
Gregory Stark wrote:
Ron Mayer [EMAIL PROTECTED] writes:
...vastly overestimate the number of pages .. because postgresql's guess
at the correlation being practically 0 despite the fact that the distinct
values for any given column are closely packed on a few pages.
I think we need a
Ron Mayer [EMAIL PROTECTED] writes:
I'm wildly guessing that, the # of pages itself seems to be
a bigger factor than the sequential/random nature.
No, they're both important: fetching N pages in sequence is way cheaper
than fetching the same number of pages scattered all over. This is
partly
Ron Mayer wrote:
Gregory Stark wrote:
Ron Mayer [EMAIL PROTECTED] writes:
...vastly overestimate the number of pages .. because postgresql's guess
at the correlation being practically 0 despite the fact that the distinct
values for any given column are closely packed on a few pages.
Joshua Reich [EMAIL PROTECTED] writes:
I lurk... I don't know if I'm a 'statistics jock', but I may be
valuable if only I had a better understanding of how the optimizer
works. I have been following this thread with interest, but could really
do with a good pointer to background
Tom Lane wrote:
I'm having a hard time getting excited about improving this query when
it's so badly coded in the first place. What's an ORDER BY doing in
an EXISTS subquery? The LIMIT is unnecessary too. And the inner WHERE
says nothing so much as I don't know how to design a database
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote:
I intend to play with some optimizer aspects. Just for fun.
Cool. If you think its fun (it is), you're half way there.
I'm a
novice in the DBMS development so I can not promise any available
results but if it can be useful even as yet another
Simon Riggs [EMAIL PROTECTED] writes:
That's what I want to do:
1. Replace not very useful indexCorrelation with indexClustering.
An opinion such as not very useful isn't considered sufficient
explanation or justification for a change around here.
There's been some previous discussion about
Simon Riggs wrote:
This type of work is 90% analysis, 10% coding. You'll need to do a lot
of investigation, lots of discussion and listening.
I absolutely agree with you and I am not about to rush into coding
right now. First of all I'm going to dig a lot in the PG sources,
readme's and so on.
Am Dienstag, 12. September 2006 12:48 schrieb Say42:
That is why caching should be taking into account
during joining cost calculation.
If you know of a more effective way to do that beyond the effective_cache_size
parameter that we have now, let us know.
--
Peter Eisentraut
Gregory Stark [EMAIL PROTECTED] writes:
It's possible there's some sort of block-wise correlated measure
which would be even better for our needs.
Actually, it seems obvious to me that the correlation measure ought to
ignore within-block ordering, but right now it does not. OTOH it's not
clear
Peter Eisentraut wrote:
If you know of a more effective way to do that beyond the effective_cache_size
parameter that we have now, let us know.
I don't know the better way and it is not my goal at all. I think about
more accurate cost estimation of nested loops join and subqueries.
Usual case
Say42 [EMAIL PROTECTED] writes:
Usual case in data request is a joining detail and some master tables
into a single relation.
Optimizing on the basis of only one example is seldom a good idea...
and I think you are falling into that trap by supposing that there
is a usual case.
Say42 wrote:
select count(*) from conn.conn20060803 c
where
exists (select code from belg_mobile tc
...
Correction: replace 'trunk_codes' with 'belg_mobile'.
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an
Tom Lane wrote:
Optimizing on the basis of only one example is seldom a good idea...
and I think you are falling into that trap by supposing that there
is a usual case.
Perhaps I am wrong but I assume normalization is a usual case, small
master (parent) tables are not very rare also.
Yes, my
I intend to play with some optimizer aspects. Just for fun. I'm a
novice in the DBMS development so I can not promise any available
results but if it can be useful even as yet another failed attempt I
will try.
That's what I want to do:
1. Replace not very useful indexCorrelation with
21 matches
Mail list logo