R: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-14 Thread Fabio Panizzutti


-Messaggio originale-
Da: Tom Lane [mailto:[EMAIL PROTECTED] 
Inviato: giovedì 13 maggio 2004 17.01
A: Fabio Panizzutti
Cc: 'Shridhar Daithankar'; [EMAIL PROTECTED]
Oggetto: Re: R: [PERFORM] Query plan on identical tables 
differs . Why ? 


Fabio Panizzutti [EMAIL PROTECTED] writes:
 I don't understand why the planner chose a different query plan on 
 identical tables with same indexes .

Different data statistics; not to mention different table 
sizes (the cost equations are not linear).

Have you ANALYZEd (or VACUUM ANALYZEd) both tables recently?

If the stats are up to date but still not doing the right 
thing, you might try increasing the statistics target for 
the larger table's tag_id column.  See ALTER TABLE SET STATISTICS.

 regards, tom lane


All tables are vacumed and analyzed . 
I try so set statistics to 1000 to tag_id columns with ALTER TABLE SET
STATISTIC, revacuum analyze ,  but the planner choose the same query
plan . 
I'm trying now to change the indexes .

Thanks 




---(end of broadcast)---
TIP 8: explain analyze is your friend


R: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-14 Thread Fabio Panizzutti


-Messaggio originale-
Da: Stephan Szabo [mailto:[EMAIL PROTECTED] 
Inviato: giovedì 13 maggio 2004 17.17
A: Fabio Panizzutti
Cc: 'Shridhar Daithankar'; [EMAIL PROTECTED]
Oggetto: Re: R: [PERFORM] Query plan on identical tables 
differs . Why ?


On Thu, 13 May 2004, Fabio Panizzutti wrote:


 I don't understand why the planner chose a different query plan on 
 identical tables with same indexes .

Because it's more than table structure that affects the 
choice made by the planner.  In addition the statistics 
about the values that are there as well as the estimated 
size of the table have effects.  One way to see is to see 
what it thinks is best is to remove the indexes it is using 
and see what plan it gives then, how long it takes and the 
estimated costs for those plans.

In other suggestions, I think having a (tag_id, data_tag) 
index rather than (data_tag, tag_id) may be a win for 
queries like this. Also, unless you're doing many select 
queries by only the first field of the composite index and 
you're not doing very many insert/update/deletes, you may 
want to drop the other index on just that field.


Thanks for your attention , i change the indexes on the tables as you
suggested  :

 storico=# \d storico_misure_short
Table tenore.storico_misure_short
 Column  |Type | Modifiers
-+-+---
 data_tag| timestamp without time zone | not null
 tag_id  | integer | not null
 unita_misura| character varying(6)| not null
 valore_tag  | numeric(20,3)   | not null
 qualita | integer | not null
 numero_campioni | numeric(5,0)|
 frequenza_campionamento | numeric(3,0)|
Indexes:
storico_misure_short_idx primary key, btree (tag_id, data_tag)
storico_misure_short_data_tag_idx2 btree (data_tag)

storico=# \d storico_misure
   Table tenore.storico_misure
 Column  |Type | Modifiers
-+-+---
 data_tag| timestamp without time zone | not null
 tag_id  | integer | not null
 unita_misura| character varying(6)| not null
 valore_tag  | numeric(20,3)   | not null
 qualita | integer | not null
 numero_campioni | numeric(5,0)|
 frequenza_campionamento | numeric(3,0)|
Indexes:
storico_misure_idx primary key, btree (tag_id, data_tag)
storico_misure_data_tag_idx2 btree (data_tag)

And now performance are similar and the planner works correctly :

storico=# \d storico_misure_short
Table tenore.storico_misure_short
 Column  |Type | Modifiers
-+-+---
 data_tag| timestamp without time zone | not null
 tag_id  | integer | not null
 unita_misura| character varying(6)| not null
 valore_tag  | numeric(20,3)   | not null
 qualita | integer | not null
 numero_campioni | numeric(5,0)|
 frequenza_campionamento | numeric(3,0)|
Indexes:
storico_misure_short_idx primary key, btree (tag_id, data_tag)
storico_misure_short_data_tag_idx2 btree (data_tag)

storico=# \d storico_misure
   Table tenore.storico_misure
 Column  |Type | Modifiers
-+-+---
 data_tag| timestamp without time zone | not null
 tag_id  | integer | not null
 unita_misura| character varying(6)| not null
 valore_tag  | numeric(20,3)   | not null
 qualita | integer | not null
 numero_campioni | numeric(5,0)|
 frequenza_campionamento | numeric(3,0)|
Indexes:
storico_misure_idx primary key, btree (tag_id, data_tag)
storico_misure_data_tag_idx2 btree (data_tag)

storico=# explain analyze select tag_id,valore_tag,data_tag from
storico_misure_short where (data_tag'2004-05-03' and data_tag
'2004-05-12') and tag_id=37423 ;
 
QUERY PLAN


--
 Index Scan using storico_misure_short_idx on storico_misure_short
(cost=0.00..2104.47 rows=584 width=20) (actual time=0.232..39.932
rows=864 loops=1)
   Index Cond: ((tag_id = 37423) AND (data_tag  '2004-05-03
00:00:00

R: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-14 Thread Fabio Panizzutti


 Index Scan using pk_storico_misure_2 on storico_misure 
(cost=0.00..1984.64 rows=658 width=21) (actual 
time=723.441..1858.107 
rows=835 loops=1)
   Index Cond: ((data_tag  '2004-05-03 
00:00:00'::timestamp without 
time zone) AND (data_tag  '2004-05-12 00:00:00'::timestamp without 
time
zone) AND (tag_id = 37423))

Either most of the time is spent skipping index tuples in 
the data_tag range 2004-05-03 to 2004-05-12 which have 
tag_id  37423, or getting those 835 rows causes a lot of 
disk seeks.

If the former is true, an index on (tag_id, data_tag) will help.

Is true , i recreate the indexes making an index on  (tag_id, data_tag)
and works fine . 


In your first message you wrote:
fsync = false

Do this only if you don't care for your data.


I set it to false , for performance tests .I've a stored procedure that
make about 2000 insert in 2 tables and 2000 delete in another and with
fsync false perfomrmance are  2.000 -3.000 ms (stable) with fsync 3.000
ms to 15.000 ms . I trust in my hardware an O.S so fsync setting is a
big dubt for my production enviroment .

Thanks a lot

Bye 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: R: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-14 Thread Christopher Kings-Lynne
 I trust in my hardware an O.S so fsync setting is a
 big dubt for my production enviroment .

Then you are making a big mistake, loving your hardware more than your
data...

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


R: R: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-14 Thread Fabio Panizzutti


-Messaggio originale-
Da: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] Per conto di 
Christopher Kings-Lynne
Inviato: venerdì 14 maggio 2004 11.55
A: Fabio Panizzutti
Cc: 'Manfred Koizar'; [EMAIL PROTECTED]
Oggetto: Re: R: R: [PERFORM] Query plan on identical tables 
differs . Why ?


 I trust in my hardware an O.S so fsync setting is a
 big dubt for my production enviroment .

Then you are making a big mistake, loving your hardware more 
than your data...

Chris



I'm testing for better performance in insert/delete so i turn off fsync
, i don't love hardware more than data , so i'll set fsync on in the
production enviroment .
Thanks a lot

Best regards

Fabio


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-13 Thread Tom Lane
Fabio Panizzutti [EMAIL PROTECTED] writes:
 I don't understand why the planner chose a different query plan on
 identical tables with same indexes . 

Different data statistics; not to mention different table sizes
(the cost equations are not linear).

Have you ANALYZEd (or VACUUM ANALYZEd) both tables recently?

If the stats are up to date but still not doing the right thing,
you might try increasing the statistics target for the larger
table's tag_id column.  See ALTER TABLE SET STATISTICS.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


R: [PERFORM] Query plan on identical tables differs . Why ?

2004-05-13 Thread Fabio Panizzutti


-Messaggio originale-
Da: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] Per conto di 
Shridhar Daithankar
Inviato: giovedì 13 maggio 2004 15.05
A: Fabio Panizzutti
Cc: [EMAIL PROTECTED]
Oggetto: Re: [PERFORM] Query plan on identical tables differs . Why ?


Fabio Panizzutti wrote:
 storico=# explain select tag_id,valore_tag,data_tag from 
 storico_misure where (data_tag'2004-05-03' and data_tag 
 '2004-05-12') and tag_id=37423 ;

Can you please post explain analyze? That includes actual timings.

storico=# explain analyze select tag_id,valore_tag,data_tag from
storico_misure where (data_tag'2004-05-03' and data_tag '2004-05-12')
and tag_id=37423 ;
 
QUERY PLAN


--
 Index Scan using pk_storico_misure_2 on storico_misure
(cost=0.00..1984.64 rows=658 width=21) (actual time=723.441..1858.107
rows=835 loops=1)
   Index Cond: ((data_tag  '2004-05-03 00:00:00'::timestamp without
time zone) AND (data_tag  '2004-05-12 00:00:00'::timestamp without time
zone) AND (tag_id = 37423))
 Total runtime: 1860.641 ms
(3 rows)

storico=# explain analyze select tag_id,valore_tag,data_tag from
storico_misure_short where (data_tag'2004-05-03' and data_tag
'2004-05-12') and tag_id=37423 ;
 
QUERY PLAN


---
 Index Scan using pk_anagtstorico_misuree_short_idx_2 on
storico_misure_short  (cost=0.00..1783.04 rows=629 width=20) (actual
time=0.323..42.186 rows=864 loops=1)
   Index Cond: (tag_id = 37423)
   Filter: ((data_tag  '2004-05-03 00:00:00'::timestamp without time
zone) AND (data_tag  '2004-05-12 00:00:00'::timestamp without time
zone))
 Total runtime: 43.166 ms




Looking at the schema, can you try and 
tag_id=37423::integer instead?


I try : 
explain analyze select tag_id,valore_tag,data_tag from storico_misure
where (data_tag'2004-05-03' and data_tag '2004-05-12') and
tag_id=37423::integer;
Index Scan using pk_storico_misure_2 on storico_misure
(cost=0.00..1984.64 rows=658 width=21) (actual time=393.337..1303.998
rows=835 loops=1)
   Index Cond: ((data_tag  '2004-05-03 00:00:00'::timestamp without
time zone) AND (data_tag  '2004-05-12 00:00:00'::timestamp without time
zone) AND (tag_id = 37423))
 Total runtime: 1306.484 ms

 enable_hashagg = false
 enable_hashjoin = false
 enable_indexscan = true
 enable_mergejoin = true
 enable_nestloop = false
 enable_seqscan = true
 enable_sort = false
 enable_tidscan = false
Why do you have these off? AFAIK, 7.4 improved hash 
aggregates a lot. So you 
might miss on these in this case.

I try for debug purpose , now i reset all 'enable' to default :
 
select * from pg_settings where name like 'enable%';
   name   | setting | context | vartype |   source   |
min_val | max_val
--+-+-+-++--
---+-
 enable_hashagg   | on  | user| bool| configuration file |
|
 enable_hashjoin  | on  | user| bool| configuration file |
|
 enable_indexscan | on  | user| bool| configuration file |
|
 enable_mergejoin | on  | user| bool| configuration file |
|
 enable_nestloop  | on  | user| bool| configuration file |
|
 enable_seqscan   | on  | user| bool| configuration file |
|
 enable_sort  | on  | user| bool| configuration file |
|
 enable_tidscan   | on  | user| bool| configuration file |
|
(8 rows)

The query plan are the same 

 # - Planner Cost Constants -
 
 #effective_cache_size = 1000   # typically 8KB each

You might set it to something realistic.


I try 1 and 10 but nothing change .



And what is your hardware setup? Disks/CPU/RAM?

32GB SCSI/DUAL Intel(R) Pentium(R) III CPU family  1133MHz/  1GB RAM
and linux red-hat 9


I don't understand why the planner chose a different query plan on
identical tables with same indexes . 

Thanks a lot for help!.

Fabio


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html