Re: [HACKERS] Correlation in cost_index()

2003-08-20 Thread Manfred Koizar
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden [EMAIL PROTECTED] wrote: the problem with your patch was that it picked an index less often than the current code when there was low correlation. Maybe bit rot? What version did you apply the patch against? Here is a new version for Postgres

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Sean Chittenden
Which suggests to me that line 3964 in ./src/backend/utils/adt/selfuncs.c isn't right for multi-column indexes, esp for indexes that are clustered. I don't know how to address this though... Tom, any hints? Yes, we knew that already. Oliver had suggested simply dropping the division

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Zeugswetter Andreas SB SD
In both cases ANALYZE will calculate correlation 1.0 for column X, and something near zero for column Y. We would like to come out with index correlation 1.0 for the left-hand case and something much less (but, perhaps, not zero) for the right-hand case. I don't really see a way to do this

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Manfred Koizar
On Fri, 8 Aug 2003 11:06:56 -0700, Sean Chittenden [EMAIL PROTECTED] wrote: [...] it'd seem as though an avg depth of nodes in index * tuples_fetched * (random_io_cost * indexCorrelation) would be closer than where we are now... Index depth does not belong here because we walk down the index only

Re: [HACKERS] Correlation in cost_index()

2003-08-14 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes: indexCorrelation is 1.0 for the 1st key in a multi-column index. ... only if it's perfectly correlated. As things stand, however, if a multi-column key is used, the indexCorrelation is penalized by the size of the number of keys found in the

Re: [HACKERS] Correlation in cost_index()

2003-08-11 Thread Sean Chittenden
AFAICS (part of) the real problem is in costsize.c:cost_index() where IO_cost is calculated from min_IO_cost, pages_fetched, random_page_cost, and indexCorrelation. The current implementation uses indexCorrelation^2 to interpolate between min_IO_cost and max_IO_cost, which IMHO gives

Re: [HACKERS] Correlation in cost_index()

2003-08-11 Thread Manfred Koizar
On Fri, 08 Aug 2003 18:25:41 -0400, Tom Lane [EMAIL PROTECTED] wrote: Two examples: [...] One more example: X Y A A a B A C b A B B b C C A c B C C

Re: [HACKERS] Correlation in cost_index()

2003-08-10 Thread Manfred Koizar
On Thu, 7 Aug 2003 13:44:19 -0700, Sean Chittenden [EMAIL PROTECTED] wrote: The indexCorrelation^2 algorithm was only a quick hack with no theory behind it :-(. I've wanted to find some better method to put in there, but have not had any time to research the problem. Could we quick hack it to

Re: [HACKERS] Correlation in cost_index()

2003-08-09 Thread Sean Chittenden
# SHOW effective_cache_size ; effective_cache_size -- 4456 (1 row) Only 35 MB? Are you testing on such a small machine? Testing on my laptop right now... can't hack on my production DBs the same way I can my laptop. The stats are attached bzip2 compressed.

Re: [HACKERS] Correlation in cost_index()

2003-08-09 Thread Sean Chittenden
Hrm, after an hour of searching and reading, I think one of the better papers on the subject can be found here: http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf Interesting paper, but I don't see the connection to index order correlation? Nothing that I found was nearly

Re: [HACKERS] Correlation in cost_index()

2003-08-08 Thread Manfred Koizar
On Fri, 8 Aug 2003 16:53:48 -0700, Sean Chittenden [EMAIL PROTECTED] wrote: # SHOW effective_cache_size ; effective_cache_size -- 4456 (1 row) Only 35 MB? Are you testing on such a small machine? The stats are attached bzip2 compressed. Nothing was attached. Did you

Re: [HACKERS] Correlation in cost_index()

2003-08-08 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes: Which suggests to me that line 3964 in ./src/backend/utils/adt/selfuncs.c isn't right for multi-column indexes, esp for indexes that are clustered. I don't know how to address this though... Tom, any hints? Yes, we knew that already. Oliver had

Re: [HACKERS] Correlation in cost_index()

2003-08-07 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes: Hrm, after an hour of searching and reading, I think one of the better papers on the subject can be found here: http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf Interesting paper, but I don't see the connection to index order correlation?

Re: [HACKERS] Correlation in cost_index()

2002-10-04 Thread Manfred Koizar
On Thu, 03 Oct 2002 14:50:00 -0400, Tom Lane [EMAIL PROTECTED] wrote: indexCorrelation is calculated by dividing the correlation of the first index column by the number of index columns. Yeah, I concluded later that that was bogus. I've been thinking of just using the correlation of the first

Re: [HACKERS] Correlation in cost_index()

2002-10-04 Thread Manfred Koizar
On Thu, 3 Oct 2002 10:45:08 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: effective cache size is the default (i.e. commented out) The default is 1000, meaning ca. 8 MB, which seems to be way too low. If your server is (almost) exclusively used by Postgres, try setting it to represent

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane [EMAIL PROTECTED] wrote: I don't think it's really a good idea to expect users to pick among multiple cost functions The idea is that PG is shipped with a default representing the best of our knowledge and users are not encouraged to change it. When

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: I'd certainly be willing to do some testing on my own data with them. Great! Gotta patch? Not yet. I've found that when the planner misses, sometimes it misses by HUGE amounts on large tables, and I have been

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: I'd certainly be willing to do some testing on my own data with them. Gotta patch? Yes, see below. Disclaimer: Apart from make; make check this is completely untested. Use at your own risk. Have fun! Servus

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Thu, 03 Oct 2002 12:40:20 +0200, I wrote: Gotta patch? Yes, see below. Oh, did I mention that inserting some break statements after the switch cases helps a lot? :-( Cavus venter non laborat libenter ... Servus Manfred ---(end of

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: I've found that when the planner misses, sometimes it misses by HUGE amounts on large tables, Scott, yet another question: are multicolunm indices involved in your estimator problems? Servus Manfred

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread scott.marlowe
On Thu, 3 Oct 2002, Manfred Koizar wrote: On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: I'd certainly be willing to do some testing on my own data with them. Great! Gotta patch? Not yet. I've found that when the planner misses, sometimes it

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread scott.marlowe
On Thu, 3 Oct 2002, Manfred Koizar wrote: On Wed, 2 Oct 2002 14:07:19 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: I've found that when the planner misses, sometimes it misses by HUGE amounts on large tables, Scott, yet another question: are multicolunm indices involved in your

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Manfred Koizar
On Thu, 3 Oct 2002 10:59:54 -0600 (MDT), scott.marlowe [EMAIL PROTECTED] wrote: are multicolunm indices involved in your estimator problems? No. Although I use them a fair bit, none of the problems I've encountered so far have involved them. But I'd be willing to setup some test indexes and

Re: [HACKERS] Correlation in cost_index()

2002-10-03 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes: Never mind! I just stumbled over those lines in selfuncs.c where indexCorrelation is calculated by dividing the correlation of the first index column by the number of index columns. Yeah, I concluded later that that was bogus. I've been thinking of

[HACKERS] Correlation in cost_index()

2002-10-02 Thread Manfred Koizar
You all know this FAQ: Why does Postgres not use my index? Half of the time this problem can easily be solved by casting a literal to the type of the respective column; this is not my topic here. In many other cases it turns out that the planner over-estimates the cost of an index scan.

Re: [HACKERS] Correlation in cost_index()

2002-10-02 Thread scott.marlowe
On Wed, 2 Oct 2002, Manfred Koizar wrote: As nobody knows how each of these proposals performs in real life under different conditions, I suggest to leave the current implementation in, add all three algorithms, and supply a GUC variable to select a cost function. I'd certainly be willing

Re: [HACKERS] Correlation in cost_index()

2002-10-02 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes: AFAICS (part of) the real problem is in costsize.c:cost_index() where IO_cost is calculated from min_IO_cost, pages_fetched, random_page_cost, and indexCorrelation. The current implementation uses indexCorrelation^2 to interpolate between min_IO_cost