Hi,
Stephan Szabo wrote:
> On Tue, 16 Oct 2001, CoL wrote:
>
>
>>---------------------------
>>The 2 table query, where prog_data has ~8800 rowsn and index on prog_id:
>>bash-2.04$ time echo "explain select distinct
>>prog_id,prog_ftype,prog_fcasthour,prog_date from prog_dgy_xy,prog_data
>>where pxygy_pid=prog_id " | psql -Uuser db
>>NOTICE: QUERY PLAN:
>>
>>Unique (cost=7432549.69..7680455.07 rows=2479054 width=32)
>> -> Sort (cost=7432549.69..7432549.69 rows=24790538 width=32)
>> -> Merge Join (cost=148864.65..161189.33 rows=24790538 width=32)
>> -> Index Scan using prog_data_pkey on prog_data
>>(cost=0.00..701.12 rows=8872 width=28)
>> -> Sort (cost=148864.65..148864.65 rows=921013 width=4)
>> -> Seq Scan on prog_dgy_xy (cost=0.00..30145.13
>>rows=921013 width=4)
>>
>
> I'm guessing that the approximately 25 million row estimate on the join
> has to be wrong as well given that prog_data.prog_id should be unique.
>
> Hmm, does the explain change if you vacuum analyze the other table
> (prog_data)? If not, what does explain show if you do a
> set enable_seqscan='off';
> before it?
The result:
db=>set enable_seqscan='off';
db=>explain select distinct prog_id,prog_ftype,prog_fcasthour,prog_date
from prog_dgy_xy,prog_data where pxygy_pid=prog_id;
NOTICE: QUERY PLAN:
Unique (cost=7606982.10..7854887.48 rows=2479054 width=32)
-> Sort (cost=7606982.10..7606982.10 rows=24790538 width=32)
-> Merge Join (cost=0.00..335621.73 rows=24790538 width=32)
-> Index Scan using progdgyxy_idx2 on prog_dgy_xy
(cost=0.00..323297.05 rows=921013 width=4)
-> Index Scan using prog_data_pkey on prog_data
(cost=0.00..701.12 rows=8872 width=28)
It "seems" index is used, but the same result :(((, and bigger execution
time: real 3m41.830s
What is in tables?
prog_data contains unique id and other info.
prog_dgy_xy contains that id with x,y coordinates (so many ids from
prog_data with unique x,y)
#prog_data:
#prog_id, prog_ftype, prog_fcasthour, prog_date
#1
'type' 6 2001-10-14 12:00:00
#2
'type' 12 2001-10-14 12:00:00
#prog_dgy_xy:
#pxygy_pid, pxygy_x, pxygy_y
#1
0.1 0.1
#1 0.1 0.15
How can this query takes real 0m1.755s for mysql, [17 sec for
oracle], and 2-3 minutes!! for postgres?
And why:
POSTGRES:
set enable_seqscan ='off'; select count(*) from prog_dgy_xy where
pxygy_pid<13161;
count
--------
900029
real 2m34.340s
explain:
Aggregate (cost=327896.89..327896.89 rows=1 width=0)
-> Index Scan using progdgyxy_idx2 on prog_dgy_xy
(cost=0.00..325594.54 rows=920940 width=0)
MYSQL:
select count(pxygy_pid) from PROG_DGY_XY where pxygy_pid<13161
count(pxygy_pid)
900029
real 0m27.878s
explain:
table type possible_keys key key_len ref rows Extra
PROG_DGY_XY range progdgyxy_idx1,progdgyxy_idx2 progdgyxy_idx2
4 NULL 906856 where used; Using index
The same time difference in case of: = or >, however explain says, cause
seq scan is off, the index is used.
I did vacuum, and vacuum analyze too before.
PS: I think i have to make a site for that, cause there are many
questions :), and weird things.
I love postgres but this makes me "hm?". Today i'll make these test
under 7.1.2.
thx
CoL
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])