Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42
Peter Eisentraut wrote:
 But you haven't offered any analysis about the cause of this problem, so any
 speculation about normalization, usual cases, caching effects and so on are
 unfounded and premature.

Ok. My previous message was a bit pompous and unfounded. Sorry.
Below I'll try to explain what I mean when I spoke about caching
effect. Let's take my pervious example (I repost query and some lines
from 'explain' here for convenience):

select count(*) from conn.conn20060803 c where
exists (select code from belg_mobile tc
where c.bnum = tc.code and c.bnum like tc.code || '%'
order by tc.code desc limit 1)

Index Scan Backward using belg_mobile_pkey on belg_mobile tc
(cost=0.00..6.42 rows=1 width=10)
(actual time=0.012..0.012 rows=0 loops=494527)

Seq Scan on belg_mobile tc
(cost=0.00..2.19 rows=1 width=10)
(actual time=0.096..0.099 rows=0 loops=494527)

belg_mobile is very small (68 rows (1 heap page) and has PK on code
column (2 index pages)). indexCorrelation is equal to 0.0445 and almost
don't affect cost estimation result.

PG cost estimation (as far as I know, of course):

Index scan cost = 2 (index pages) + 1 (heap pages) * 4
(random_page_cost)
  + ( 0.0025 (cpu_operator_cost) * 3 (# ops) + 0.001
(cpu_index_tuple_cost)
  + 0.01 (cpu_tuple_cost) ) * 68 (record count) * 0.5 (selectivity of
subquery)
  ~ 6 (pages fetch cost) + 0.42 (cpu cost) = 6.42

Seq scan cost = 1(heap page) + (0.0025 (cpu_operator_cost) * 3 (# ops)
  + 0.01 (cpu_tuple_cost)) * 68 (record count)
  = 1 (pages fetch cost) + 1.19 (cpu cost) = 2.19

The estimation is ok if we touch the belg_mobile table only once. In
the subquery we do it many times. After the first iteration of the
subquery all the belg_mobile's heap and index pages are in the cache
and cost per iteration should be estimated using formulae:

Index scan cost = ( 6 (pages fetch cost) + 0.42 (cpu cost)
  * 500K (conn table row count) ) / 500K  ~ 0.42

Seq scan cost = ( 1 (pages fetch cost) + 1.19 (cpu cost)
  * 500K (conn table row count) ) / 500K  ~ 1.19

Index scan actually more cheaper because less than one tenth of conn
rows have appropriate codes in the belg_mobile table.

That's what I want to say. I am not a veteran DBMS user so I can not
gauge importance of this cost inaccuracy in the whole. I hope you help
me to look at the problem (?) more widely than I can at the moment.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-13 Thread Say42
Tom Lane wrote:

 I'm having a hard time getting excited about improving this query when
 it's so badly coded in the first place.  What's an ORDER BY doing in
 an EXISTS subquery?  The LIMIT is unnecessary too.  And the inner WHERE
 says nothing so much as I don't know how to design a database :-(.

It was the test query which has the same execution plan for belg_mobile
(and the same problem) as the production query below:

select (select max(code) from belg_mobile tc
where c.bnum = tc.code and c.bnum like tc.code || '%') as code,
  c.cause,
  c.ilno,
  extract(hour from c.datetime) as hour,
  count(*) as cnt,
  sum(c.dur) as dur
from conn.conn20060803 c
where itgrp = :itgrp
group by 1,2,3,4

It's a simple OLAP query for analysis telephonic traffic distribution
over time and trunk codes.
'max(codes)' is used to get  the most matching code. For example,
84725 and 8472 are both valid codes, and number 84725123456 must match
84725 but not 8472. The 'c.bnum = tc.code' qual significantly reduce
index scan and execution time.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Simon Riggs wrote:
 This type of work is 90% analysis, 10% coding. You'll need to do a lot
 of investigation, lots of discussion and listening.

I absolutely agree with you and I am not about to rush into coding
right now. First of all I'm going to dig a lot in the PG sources,
readme's and so on. It's a good school of coding and DBMS internals
understanding.

  That's what I want to do:
  1. Replace not very useful indexCorrelation with indexClustering.

 An opinion such as not very useful isn't considered sufficient
 explanation or justification for a change around here.

Sometimes the indexCorrelation even wrongful. There are many examples
of overestimation of index scan cost (data well-clustered but not
ordered - correlation is low) and some cases of underestimation when
tuples look like well ordered with high degree of correlation, but
index scan actually causes random page fetches (1-3-2-4-6-5, for
example. On server without RAID it is VERY slow. 25 times slower than
bitmap index scan). If we have special clustering measure we can more
precisely estimate pages count.
The next step could be to introduce 'ordering'  as a measure of
pages access sequentiality. Without the 'ordering' all we can
assume that pages are fetched in random order. Anyhow, if index access
cost is overestimated we can set random_page_cost=2. (Is it true in a
production database with smart RAID?)
Moreover, I think problem is more complex. With assumption that index
access is always random we dip in another problem: overestimation of
master table index scan. If it is small enough PG can choose seq scan
instead of index scan even if the last one actually much cheaper
because of caching. That is why caching should be taking into account
during joining cost calculation.

  2. Consider caching of inner table in a nested loops join during
  estimation total cost of the join.

 I'd work on one thing at a time and go into it deeply.

Good news. So I'm very interested in what you think about my ideas.
Is it wrong or too naive?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Peter Eisentraut wrote:

 If you know of a more effective way to do that beyond the effective_cache_size
 parameter that we have now, let us know.

I don't know the better way and it is not my goal at all. I think about
more accurate cost estimation of nested loops join and subqueries.
Usual case in data request is a joining detail and some master tables
into a single relation. Often master tables are small and after some
nested loops iterations are well (perhaps wholly) cached. Cost
estimation of the tables access path don't care about the such caching
and cause overestimation. In some cases it can lead up to choosing not
the best plan.

Example from real life. The following request return count of national
calls from the call registration table.

select count(*) from conn.conn20060803 c
where
exists (select code from trunk_codes tc
where c.bnum = tc.code and c.bnum like tc.code || '%'
order by tc.code desc limit 1)

enable_seqscan = off:

Aggregate  (cost=103185258.68..103185258.69 rows=1 width=0) (actual
time=13385.674..13385.676 rows=1 loops=1)
  -  Seq Scan on conn20060803 c  (cost=1.00..103184640.52
rows=247264 width=0) (actual time=0.409..13307.254 rows=38739 loops=1)
Filter: (subplan)
SubPlan
  -  Limit  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.020..0.020 rows=0 loops=494527)
-  Index Scan Backward using belg_mobile_pkey on
belg_mobile tc  (cost=0.00..6.42 rows=1 width=10) (actual
time=0.012..0.012 rows=0 loops=494527)
  Index Cond: (($0)::text = (code)::text)
  Filter: (($0)::text ~~ ((code)::text ||
'%'::text))
Total runtime: 13385.808 ms

enable_seqscan =on:

Aggregate  (cost=1101623.47..1101623.48 rows=1 width=0) (actual
time=63724.508..63724.509 rows=1 loops=1)
  -  Seq Scan on conn20060803 c  (cost=0.00..1101005.30 rows=247264
width=0) (actual time=2.244..63640.413 rows=38739 loops=1)
Filter: (subplan)
SubPlan
  -  Limit  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.121..0.121 rows=0 loops=494527)
-  Sort  (cost=2.20..2.20 rows=1 width=10) (actual
time=0.114..0.114 rows=0 loops=494527)
  Sort Key: code
  -  Seq Scan on belg_mobile tc  (cost=0.00..2.19
rows=1 width=10) (actual time=0.096..0.099 rows=0 loops=494527)
Filter: ((($0)::text = (code)::text) AND
(($0)::text ~~ ((code)::text || '%'::text)))
Total runtime: 63724.630 ms


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Say42 wrote:

 select count(*) from conn.conn20060803 c
 where
 exists (select code from belg_mobile tc
...

Correction: replace 'trunk_codes' with 'belg_mobile'.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Optimizer improvements: to do or not to do?

2006-09-12 Thread Say42
Tom Lane wrote:
 Optimizing on the basis of only one example is seldom a good idea...
 and I think you are falling into that trap by supposing that there
 is a usual case.

Perhaps I am wrong but I assume normalization is a usual case, small
master (parent) tables are not very rare also.
Yes, my example is unusual but it is _real_ and demonstrate PG
optimizer inaccuracy. Why don't we make PG optimizer more close to
reality if we can? Is it so needless and I make a mountain out of a
molehill?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Optimizer improvements: to do or not to do?

2006-09-11 Thread Say42
I intend to play with some optimizer aspects. Just for fun. I'm a
novice in the DBMS development so I can not promise any available
results but if it can be useful even as yet another failed attempt I
will try.

That's what I want to do:
1. Replace not very useful indexCorrelation with indexClustering.
2. Consider caching of inner table in a nested loops join during
estimation total cost of the join.

More details:
1. During analyze we have sample rows. For every N-th sample row we can
scan indices on qual like 'value = index_first_column' and fetch first
N row TIDs. To estimate count of fetched heap pages is not hard. To
take the index clustering value just divide the pages count by the
sample rows count.
2. It's more-more harder and may be impossible to me at all. The main
ideas:
- split page fetches cost and CPU cost into different variables and
don't summarize it before join estimation.
- final path cost estimation should be done in the join cost estimation
and take into account number of inner table access (=K). CPU cost is
directly proportionate to K but page fetches can be estimated by
Mackert and Lohman formula using the total tuples count (K *
inner_table_selectivity * inner_table_total_tuples).

Any thoughts?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings