On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden
[EMAIL PROTECTED] wrote:
the problem with your patch was
that it picked an index less often than the current code when there
was low correlation.
Maybe bit rot? What version did you apply the patch against? Here is
a new version for Postgres
Which suggests to me that line 3964 in
./src/backend/utils/adt/selfuncs.c isn't right for multi-column
indexes, esp for indexes that are clustered. I don't know how to
address this though... Tom, any hints?
Yes, we knew that already. Oliver had suggested simply dropping the
division
In both cases ANALYZE will calculate correlation 1.0 for column X,
and something near zero for column Y. We would like to come out with
index correlation 1.0 for the left-hand case and something much less
(but, perhaps, not zero) for the right-hand case. I don't really see
a way to do this
On Fri, 8 Aug 2003 11:06:56 -0700, Sean Chittenden
[EMAIL PROTECTED] wrote:
[...] it'd seem as though an avg depth of
nodes in index * tuples_fetched * (random_io_cost * indexCorrelation)
would be closer than where we are now...
Index depth does not belong here because we walk down the index only
Sean Chittenden [EMAIL PROTECTED] writes:
indexCorrelation is 1.0 for the 1st key in a multi-column index.
... only if it's perfectly correlated.
As things stand, however, if a multi-column key is
used, the indexCorrelation is penalized by the size of the number of
keys found in the
AFAICS (part of) the real problem is in costsize.c:cost_index() where
IO_cost is calculated from min_IO_cost, pages_fetched,
random_page_cost, and indexCorrelation. The current implementation
uses indexCorrelation^2 to interpolate between min_IO_cost and
max_IO_cost, which IMHO gives
On Fri, 08 Aug 2003 18:25:41 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
Two examples: [...]
One more example:
X Y
A A
a B
A C
b A
B B
b C
C A
c B
C C
On Thu, 7 Aug 2003 13:44:19 -0700, Sean Chittenden
[EMAIL PROTECTED] wrote:
The indexCorrelation^2 algorithm was only a quick hack with no theory
behind it :-(. I've wanted to find some better method to put in there,
but have not had any time to research the problem.
Could we quick hack it to
# SHOW effective_cache_size ;
effective_cache_size
--
4456
(1 row)
Only 35 MB? Are you testing on such a small machine?
Testing on my laptop right now... can't hack on my production DBs the
same way I can my laptop.
The stats are attached bzip2 compressed.
Hrm, after an hour of searching and reading, I think one of the
better papers on the subject can be found here:
http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf
Interesting paper, but I don't see the connection to index order
correlation?
Nothing that I found was nearly
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden
[EMAIL PROTECTED] wrote:
# SHOW effective_cache_size ;
effective_cache_size
--
4456
(1 row)
Only 35 MB? Are you testing on such a small machine?
The stats are attached bzip2 compressed.
Nothing was attached. Did you
Sean Chittenden [EMAIL PROTECTED] writes:
Which suggests to me that line 3964 in
./src/backend/utils/adt/selfuncs.c isn't right for multi-column
indexes, esp for indexes that are clustered. I don't know how to
address this though... Tom, any hints?
Yes, we knew that already. Oliver had
Sean Chittenden [EMAIL PROTECTED] writes:
Hrm, after an hour of searching and reading, I think one of the better
papers on the subject can be found here:
http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf
Interesting paper, but I don't see the connection to index order
correlation?
On Thu, 03 Oct 2002 14:50:00 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
indexCorrelation is calculated by dividing the correlation of the
first index column by the number of index columns.
Yeah, I concluded later that that was bogus. I've been thinking of
just using the correlation of the first
On Thu, 3 Oct 2002 10:45:08 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
effective cache size is the default (i.e. commented out)
The default is 1000, meaning ca. 8 MB, which seems to be way too low.
If your server is (almost) exclusively used by Postgres, try setting
it to represent
On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane [EMAIL PROTECTED]
wrote:
I don't think it's really a good idea to expect users to pick among
multiple cost functions
The idea is that PG is shipped with a default representing the best of
our knowledge and users are not encouraged to change it. When
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
I'd certainly be willing to do some testing on my own data with them.
Great!
Gotta patch?
Not yet.
I've found that when the planner misses, sometimes it misses
by HUGE amounts on large tables, and I have been
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
I'd certainly be willing to do some testing on my own data with them.
Gotta patch?
Yes, see below. Disclaimer: Apart from make; make check this is
completely untested. Use at your own risk. Have fun!
Servus
On Thu, 03 Oct 2002 12:40:20 +0200, I wrote:
Gotta patch?
Yes, see below.
Oh, did I mention that inserting some break statements after the
switch cases helps a lot? :-(
Cavus venter non laborat libenter ...
Servus
Manfred
---(end of
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
I've found that when the planner misses, sometimes it misses
by HUGE amounts on large tables,
Scott,
yet another question: are multicolunm indices involved in your
estimator problems?
Servus
Manfred
On Thu, 3 Oct 2002, Manfred Koizar wrote:
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
I'd certainly be willing to do some testing on my own data with them.
Great!
Gotta patch?
Not yet.
I've found that when the planner misses, sometimes it
On Thu, 3 Oct 2002, Manfred Koizar wrote:
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
I've found that when the planner misses, sometimes it misses
by HUGE amounts on large tables,
Scott,
yet another question: are multicolunm indices involved in your
On Thu, 3 Oct 2002 10:59:54 -0600 (MDT), scott.marlowe
[EMAIL PROTECTED] wrote:
are multicolunm indices involved in your estimator problems?
No. Although I use them a fair bit, none of the problems I've encountered
so far have involved them. But I'd be willing to setup some test indexes
and
Manfred Koizar [EMAIL PROTECTED] writes:
Never mind! I just stumbled over those lines in selfuncs.c where
indexCorrelation is calculated by dividing the correlation of the
first index column by the number of index columns.
Yeah, I concluded later that that was bogus. I've been thinking of
You all know this FAQ: Why does Postgres not use my index? Half of
the time this problem can easily be solved by casting a literal to the
type of the respective column; this is not my topic here.
In many other cases it turns out that the planner over-estimates the
cost of an index scan.
On Wed, 2 Oct 2002, Manfred Koizar wrote:
As nobody knows how each of these proposals performs in real life
under different conditions, I suggest to leave the current
implementation in, add all three algorithms, and supply a GUC variable
to select a cost function.
I'd certainly be willing
Manfred Koizar [EMAIL PROTECTED] writes:
AFAICS (part of) the real problem is in costsize.c:cost_index() where
IO_cost is calculated from min_IO_cost, pages_fetched,
random_page_cost, and indexCorrelation. The current implementation
uses indexCorrelation^2 to interpolate between min_IO_cost
27 matches
Mail list logo