Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Peter Eisentraut
Say42 wrote: Perhaps I am wrong but I assume normalization is a usual case, small master (parent) tables are not very rare also. Yes, my example is unusual but it is _real_ and demonstrate PG optimizer inaccuracy. Why don't we make PG optimizer more close to reality if we can? Is it so

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42
Peter Eisentraut wrote: But you haven't offered any analysis about the cause of this problem, so any speculation about normalization, usual cases, caching effects and so on are unfounded and premature. Ok. My previous message was a bit pompous and unfounded. Sorry. Below I'll try to explain

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane
Say42 [EMAIL PROTECTED] writes: ... Let's take my pervious example (I repost query and some lines from 'explain' here for convenience): select count(*) from conn.conn20060803 c where exists (select code from belg_mobile tc where c.bnum = tc.code and c.bnum like tc.code || '%'

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Ron Mayer
Simon Riggs wrote: On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote: That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. Not

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Gregory Stark
Ron Mayer [EMAIL PROTECTED] writes: It's common here for queries to vastly overestimate the number of pages that would need to be read because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread AgentM
On Sep 13, 2006, at 14:44 , Gregory Stark wrote: I think we need a serious statistics jock to pipe up with some standard metrics that do what we need. Otherwise we'll never have a solid footing for the predictions we make and will never know how much we can trust them. That said I'm

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Ron Mayer
Gregory Stark wrote: Ron Mayer [EMAIL PROTECTED] writes: ...vastly overestimate the number of pages .. because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages. I think we need a

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes: I'm wildly guessing that, the # of pages itself seems to be a bigger factor than the sequential/random nature. No, they're both important: fetching N pages in sequence is way cheaper than fetching the same number of pages scattered all over. This is partly

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Joshua Reich
Ron Mayer wrote: Gregory Stark wrote: Ron Mayer [EMAIL PROTECTED] writes: ...vastly overestimate the number of pages .. because postgresql's guess at the correlation being practically 0 despite the fact that the distinct values for any given column are closely packed on a few pages.

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Tom Lane
Joshua Reich [EMAIL PROTECTED] writes: I lurk... I don't know if I'm a 'statistics jock', but I may be valuable if only I had a better understanding of how the optimizer works. I have been following this thread with interest, but could really do with a good pointer to background

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42
Tom Lane wrote: I'm having a hard time getting excited about improving this query when it's so badly coded in the first place. What's an ORDER BY doing in an EXISTS subquery? The LIMIT is unnecessary too. And the inner WHERE says nothing so much as I don't know how to design a database

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Simon Riggs
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote: I intend to play with some optimizer aspects. Just for fun. Cool. If you think its fun (it is), you're half way there. I'm a novice in the DBMS development so I can not promise any available results but if it can be useful even as yet another

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes: That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. There's been some previous discussion about

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Simon Riggs wrote: This type of work is 90% analysis, 10% coding. You'll need to do a lot of investigation, lots of discussion and listening. I absolutely agree with you and I am not about to rush into coding right now. First of all I'm going to dig a lot in the PG sources, readme's and so on.

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Peter Eisentraut
Am Dienstag, 12. September 2006 12:48 schrieb Say42: That is why caching should be taking into account during joining cost calculation. If you know of a more effective way to do that beyond the effective_cache_size parameter that we have now, let us know. -- Peter Eisentraut

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: It's possible there's some sort of block-wise correlated measure which would be even better for our needs. Actually, it seems obvious to me that the correlation measure ought to ignore within-block ordering, but right now it does not. OTOH it's not clear

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Peter Eisentraut wrote: If you know of a more effective way to do that beyond the effective_cache_size parameter that we have now, let us know. I don't know the better way and it is not my goal at all. I think about more accurate cost estimation of nested loops join and subqueries. Usual case

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Tom Lane
Say42 [EMAIL PROTECTED] writes: Usual case in data request is a joining detail and some master tables into a single relation. Optimizing on the basis of only one example is seldom a good idea... and I think you are falling into that trap by supposing that there is a usual case.

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Say42 wrote: select count(*) from conn.conn20060803 c where exists (select code from belg_mobile tc ... Correction: replace 'trunk_codes' with 'belg_mobile'. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an

Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Tom Lane wrote: Optimizing on the basis of only one example is seldom a good idea... and I think you are falling into that trap by supposing that there is a usual case. Perhaps I am wrong but I assume normalization is a usual case, small master (parent) tables are not very rare also. Yes, my

[HACKERS] Optimizer improvements: to do or not to do?

2006-09-11 Thread Say42
I intend to play with some optimizer aspects. Just for fun. I'm a novice in the DBMS development so I can not promise any available results but if it can be useful even as yet another failed attempt I will try. That's what I want to do: 1. Replace not very useful indexCorrelation with