[PERFORM] Any better plan for this query?..
Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. Table definitions: create table STAT ( REF CHAR(3)not null, NAMECHAR(40) not null, NUMBINTnot null ); create table HISTORY ( REF_OBJECT CHAR(10) not null, HORDER INT not null, REF_STATCHAR(3) not null, BEGIN_DATE CHAR(12) not null, END_DATECHAR(12) , NOTECHAR(100) ); create unique index stat_ref_idx on STAT( ref ); create index history_ref_idx on HISTORY( ref_object, horder ); NOTE: The same query runs 2 times faster on MySQL. Any idea?.. Rgds, -Dimitri -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; OK, so you're taking a simple: history INNER JOIN stat ON (stat.ref = history.ref_stat) then filtering for records with a particular value of history.ref_object and finally performing a sort. If I'm reading it right, the plan below does a sequential scan on the `stat' table. The stat table only has 1000 rows, so this isn't necessarily an unreasonable choice even if there is an appropriate index and even if not many of the rows will be needed. It then does an index scan of the history table looking for tuples with ref_object = '01' (text match). It hash joins the hashed results of the initial seq scan to the results of the index scan, and sorts the result. To me, that looks pretty reasonable. You might be able to avoid the hash join in favour of a nested loop scan of stat_ref_idx (looping over records from history.ref_stat where ref_object = '001') by providing a composite index on HISTORY(ref_stat, ref_object). I'm really not too sure, though; plan optimization isn't my thing, I'm just seeing if I can offer a few ideas. Table definitions: While not strictly necessary, it's a *REALLY* good idea to define a suitable PRIMARY KEY. Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for bounded-length values, or `text' for unbounded fields, unless you REALLY want the crazy behaviour of `CHAR(n)'. I'm a little bit puzzled about why you seem to be doing lots of things with integer values stored in text strings, but that probably doesn't matter too much for the issue at hand. NOTE: The same query runs 2 times faster on MySQL. With InnoDB tables and proper transactional safety? Or using scary MyISAM tables and a just pray approach to data integrity? If you're using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly fast, but oh-my-god dangerous. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Hi Craig, yes, you detailed very well the problem! :-) all those CHAR columns are so just due historical issues :-) as well they may contains anything else and not only numbers, that's why.. Also, all data inside are fixed, so VARCHAR will not save place, or what kind of performance issue may we expect with CHAR vs VARCHAR if all data have a fixed length?.. Any way to force nested loop without additional index?.. It's 2 times faster on InnoDB, and as it's just a SELECT query no need to go in transaction details :-) Rgds, -Dimitri On 5/6/09, Craig Ringer cr...@postnewspapers.com.au wrote: Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; OK, so you're taking a simple: history INNER JOIN stat ON (stat.ref = history.ref_stat) then filtering for records with a particular value of history.ref_object and finally performing a sort. If I'm reading it right, the plan below does a sequential scan on the `stat' table. The stat table only has 1000 rows, so this isn't necessarily an unreasonable choice even if there is an appropriate index and even if not many of the rows will be needed. It then does an index scan of the history table looking for tuples with ref_object = '01' (text match). It hash joins the hashed results of the initial seq scan to the results of the index scan, and sorts the result. To me, that looks pretty reasonable. You might be able to avoid the hash join in favour of a nested loop scan of stat_ref_idx (looping over records from history.ref_stat where ref_object = '001') by providing a composite index on HISTORY(ref_stat, ref_object). I'm really not too sure, though; plan optimization isn't my thing, I'm just seeing if I can offer a few ideas. Table definitions: While not strictly necessary, it's a *REALLY* good idea to define a suitable PRIMARY KEY. Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for bounded-length values, or `text' for unbounded fields, unless you REALLY want the crazy behaviour of `CHAR(n)'. I'm a little bit puzzled about why you seem to be doing lots of things with integer values stored in text strings, but that probably doesn't matter too much for the issue at hand. NOTE: The same query runs 2 times faster on MySQL. With InnoDB tables and proper transactional safety? Or using scary MyISAM tables and a just pray approach to data integrity? If you're using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly fast, but oh-my-god dangerous. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. The bad doesn't look too bad to me, although the planner is over-estimating the number of matches in the history table (2404 vs 20). That's a bit surprising given how simple the predicate is. Make sure you've ANALYZEd the table. If that's not enough, you can try to increase the statistics target for ref_object column, ie. ALTER TABLE history ALTER COLUMN ref_object SET STATISTICS 500. That might give you a different plan, maybe with a nested loop join instead of hash join, which might be faster in this case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Craig, yes, you detailed very well the problem! :-) all those CHAR columns are so just due historical issues :-) as well they may contains anything else and not only numbers, that's why.. Also, all data inside are fixed, so VARCHAR will not save place, or what kind of performance issue may we expect with CHAR vs VARCHAR if all data have a fixed length?.. None in postgres, but the char/varchar thing may or may not bite you at some point later - sounds like you have it covered though. It's 2 times faster on InnoDB, and as it's just a SELECT query no need to go in transaction details :-) Total runtime: 1.442 ms (10 rows) You posted a query that's taking 2/1000's of a second. I don't really see a performance problem here :) -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Hi Heikki, I've already tried a target 1000 and the only thing it changes comparing to the current 100 (default) is instead of 2404 rows it says 240 rows, but the plan remaining the same.. Rgds, -Dimitri On 5/6/09, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Dimitri wrote: any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. The bad doesn't look too bad to me, although the planner is over-estimating the number of matches in the history table (2404 vs 20). That's a bit surprising given how simple the predicate is. Make sure you've ANALYZEd the table. If that's not enough, you can try to increase the statistics target for ref_object column, ie. ALTER TABLE history ALTER COLUMN ref_object SET STATISTICS 500. That might give you a different plan, maybe with a nested loop join instead of hash join, which might be faster in this case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB, so before I'll say myself it's ok I want to be sure there is nothing else to do.. :-) Rgds, -Dimitri On 5/6/09, Chris dmag...@gmail.com wrote: Dimitri wrote: Hi Craig, yes, you detailed very well the problem! :-) all those CHAR columns are so just due historical issues :-) as well they may contains anything else and not only numbers, that's why.. Also, all data inside are fixed, so VARCHAR will not save place, or what kind of performance issue may we expect with CHAR vs VARCHAR if all data have a fixed length?.. None in postgres, but the char/varchar thing may or may not bite you at some point later - sounds like you have it covered though. It's 2 times faster on InnoDB, and as it's just a SELECT query no need to go in transaction details :-) Total runtime: 1.442 ms (10 rows) You posted a query that's taking 2/1000's of a second. I don't really see a performance problem here :) -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Hi Richard, no, of course it's not based on explain :-) I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. Rgds, -Dimitri On 5/6/09, Richard Huxton d...@archonet.com wrote: Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB How do you know? This isn't just based on the explain values reported, is it? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB How do you know? This isn't just based on the explain values reported, is it? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Richard, no, of course it's not based on explain :-) I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. And you are satisfied that it is the planned query time that is the dominant factor here, and not parsing time, connection time, data transport, disk bandwidth etc? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 6, 2009 at 3:38 AM, Dimitri dimitrik...@gmail.com wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. Table definitions: create table STAT ( REF CHAR(3) not null, NAME CHAR(40) not null, NUMB INT not null ); create table HISTORY ( REF_OBJECT CHAR(10) not null, HORDER INT not null, REF_STAT CHAR(3) not null, BEGIN_DATE CHAR(12) not null, END_DATE CHAR(12) , NOTE CHAR(100) ); create unique index stat_ref_idx on STAT( ref ); create index history_ref_idx on HISTORY( ref_object, horder ); NOTE: The same query runs 2 times faster on MySQL. couple of things to try: *) as others have noted, get rid of char() columns. use varchar, or int if you can. this is a bigger deal in postgres than mysql. *) curious if disabling sequential scan helps (set enable_seqscan = false) or changes the plan. .3 msec is spent on seq scan and an index lookup is likely much faster. *) prepare the query: prepare history_stat(char(10) as select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = $1 order by H.HORDER ; execute history_stat('01'); (prepared queries have some annoyances you need to be prepared to deal with. however, they are quite useful when squeezing every last msec out of fast queries). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 6, 2009 at 7:46 AM, Merlin Moncure mmonc...@gmail.com wrote: prepare history_stat(char(10) as typo: prepare history_stat(char(10)) as -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. Why don't you use MySQL then? Or tune PostgreSQL? Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
I'll try to answer all mails at once :-)) - query is running fully in RAM, no I/O, no network, only CPU time - looping 100 times the same query gives 132ms total time (~1.32ms per query), while it's 44ms on InnoDB (~0.44ms per query) - disabling seq scan forcing a planner to use an index scan, and finally it worse as gives 1.53ms per query.. - prepare the query helps: prepare statement takes 16ms, but execute runs in 0.98ms = which make me think it's not only a planner overhead... And it's still 2 times lower vs 0.44ms. Also, generally prepare cannot be used in this test case as we suppose any query may be of any kind (even if it's not always true :-)) - char or varchar should be used here because the reference code is supposed to accept any characters (alphanumeric) - it also reminds me that probably there are some extra CPU time due locale setting - but all my lc_* variables are set to C... Rgds, -Dimitri On 5/6/09, Merlin Moncure mmonc...@gmail.com wrote: On Wed, May 6, 2009 at 7:46 AM, Merlin Moncure mmonc...@gmail.com wrote: prepare history_stat(char(10) as typo: prepare history_stat(char(10)) as -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
The story is simple: for the launching of MySQL 5.4 I've done a testing comparing available on that time variations of InnoDB engines, and at the end by curiosity started the same test with PostgreSQL 8.3.7 to see if MySQL performance level is more close to PostgreSQL now (PG was a strong true winner before). For my big surprise MySQL 5.4 outpassed 8.3.7... However, analyzing the PostgreSQL processing I got a feeling something goes wrong on PG side.. So, now I've installed both 8.3.7 and 8.4beta1 to see more in depth what's going on. Currently 8.4 performs much better than 8.3.7, but there is still a room for improvement if such a small query may go faster :-) Rgds, -Dimitri On 5/6/09, Albe Laurenz laurenz.a...@wien.gv.at wrote: Dimitri wrote: I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. Why don't you use MySQL then? Or tune PostgreSQL? Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: I'll try to answer all mails at once :-)) - query is running fully in RAM, no I/O, no network, only CPU time - looping 100 times the same query gives 132ms total time (~1.32ms per query), while it's 44ms on InnoDB (~0.44ms per query) Well, assuming you're happy that PG is tuned reasonably for your machine and that MySQL's query cache isn't returning the results here it looks like MySQL is faster for this particular query. The only obvious place there could be a big gain is with the hashing algorithm. If you remove the ORDER BY and the query-time doesn't fall by much then it's the hash phase. The other thing to try is to alter the query to be a SELECT count(*) rather than returning rows - that will let you measure the time to transfer the result rows. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 06, 2009 at 04:01:03PM +0800, Craig Ringer wrote: Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; OK, so you're taking a simple: history INNER JOIN stat ON (stat.ref = history.ref_stat) then filtering for records with a particular value of history.ref_object and finally performing a sort. If I'm reading it right, the plan below does a sequential scan on the `stat' table. The stat table only has 1000 rows, so this isn't necessarily an unreasonable choice even if there is an appropriate index and even if not many of the rows will be needed. It then does an index scan of the history table looking for tuples with ref_object = '01' (text match). It hash joins the hashed results of the initial seq scan to the results of the index scan, and sorts the result. To me, that looks pretty reasonable. You might be able to avoid the hash join in favour of a nested loop scan of stat_ref_idx (looping over records from history.ref_stat where ref_object = '001') by providing a composite index on HISTORY(ref_stat, ref_object). I'm really not too sure, though; plan optimization isn't my thing, I'm just seeing if I can offer a few ideas. Table definitions: While not strictly necessary, it's a *REALLY* good idea to define a suitable PRIMARY KEY. Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for bounded-length values, or `text' for unbounded fields, unless you REALLY want the crazy behaviour of `CHAR(n)'. I'm a little bit puzzled about why you seem to be doing lots of things with integer values stored in text strings, but that probably doesn't matter too much for the issue at hand. NOTE: The same query runs 2 times faster on MySQL. With InnoDB tables and proper transactional safety? Or using scary MyISAM tables and a just pray approach to data integrity? If you're using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly fast, but oh-my-god dangerous. -- Craig Ringer I just thought I would ask. Are you using the query cache in MySQL? If that is on, that could be the difference. Another thing to check, try issuing the selects concurrently: 2 at a time, 5 at a time, 10 at a time... and see if that has an effect on timing. In many of the benchmarks, MySQL will out perform PostgreSQL for very low numbers of clients. Once you are using more than a handful, PostgreSQL pulls ahead. Also, is this a completely static table? i.e. no updates or inserts. How is the performance with those happening? This should help you get a clearer picture of the performance. My two cents. Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 06, 2009 at 02:49:23PM +0200, Dimitri wrote: The story is simple: for the launching of MySQL 5.4 I've done a testing comparing available on that time variations of InnoDB engines, and at the end by curiosity started the same test with PostgreSQL 8.3.7 to see if MySQL performance level is more close to PostgreSQL now (PG was a strong true winner before). For my big surprise MySQL 5.4 outpassed 8.3.7... However, analyzing the PostgreSQL processing I got a feeling something goes wrong on PG side.. So, now I've installed both 8.3.7 and 8.4beta1 to see more in depth what's going on. Currently 8.4 performs much better than 8.3.7, but there is still a room for improvement if such a small query may go faster :-) Rgds, -Dimitri On 5/6/09, Albe Laurenz laurenz.a...@wien.gv.at wrote: Dimitri wrote: I've run several tests before and now going in depth to understand if there is nothing wrong. Due such a single query time difference InnoDB is doing 2-3 times better TPS level comparing to PostgreSQL.. Why don't you use MySQL then? Or tune PostgreSQL? Yours, Laurenz Albe Another thought, have you tuned PostgreSQL for an in memory database? Those tuning options may be what is needed to improve the plan chosen by PostgreSQL. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On May 6, 2009, at 7:53 AM, Richard Huxton wrote: Dimitri wrote: I'll try to answer all mails at once :-)) - query is running fully in RAM, no I/O, no network, only CPU time - looping 100 times the same query gives 132ms total time (~1.32ms per query), while it's 44ms on InnoDB (~0.44ms per query) Well, assuming you're happy that PG is tuned reasonably for your machine and that MySQL's query cache isn't returning the results here it looks like MySQL is faster for this particular query. The only obvious place there could be a big gain is with the hashing algorithm. If you remove the ORDER BY and the query-time doesn't fall by much then it's the hash phase. The other thing to try is to alter the query to be a SELECT count(*) rather than returning rows - that will let you measure the time to transfer the result rows. -- Richard Huxton Archonet Ltd Do you expect to run this query 100 times per second during your application? or is this just a test to see how fast the query is for optimalisation. I always get scared myself with such a test as 'runs out of memory', reason given is that usually this is not really the case in a production environment. Try to make a little test case where you give the query random parameters so different result sets are returned. This will give you a better idea on how fast the query really is and might give you better comparison results. instead of count(*) I isusallt do explain analyze to see how fast PostgreSQL handles to query. Ries -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Folks, first of all: - I used a fixed reference value just to simplify the case analyzing and isolate it as max as possible, of course during my tests all values are random :-) - final goal of the test is to analyze scalability, so yes, concurrent sessions with random keys are growing from 1 to 256 (I run it on 32cores server, no think time, just stressing), and the result is still not yet better comparing to InnoDB - I'm analyzing this query running in memory to understand what's blocking while all main bottlenecks are avoided (no I/O anymore nor network, etc.) - initial explain analyze and table details were posted in the first message Now, let's go more further: - so as it query execution took 1.50ms - after removing order by it took 1.19ms - select count(*) instead of columns and with removed order by took 0.98ms - execute of the same prepared select count(*) ... took 0.68ms So, where the time is going?... Rgds, -Dimitri On 5/6/09, Ries van Twisk p...@rvt.dds.nl wrote: On May 6, 2009, at 7:53 AM, Richard Huxton wrote: Dimitri wrote: I'll try to answer all mails at once :-)) - query is running fully in RAM, no I/O, no network, only CPU time - looping 100 times the same query gives 132ms total time (~1.32ms per query), while it's 44ms on InnoDB (~0.44ms per query) Well, assuming you're happy that PG is tuned reasonably for your machine and that MySQL's query cache isn't returning the results here it looks like MySQL is faster for this particular query. The only obvious place there could be a big gain is with the hashing algorithm. If you remove the ORDER BY and the query-time doesn't fall by much then it's the hash phase. The other thing to try is to alter the query to be a SELECT count(*) rather than returning rows - that will let you measure the time to transfer the result rows. -- Richard Huxton Archonet Ltd Do you expect to run this query 100 times per second during your application? or is this just a test to see how fast the query is for optimalisation. I always get scared myself with such a test as 'runs out of memory', reason given is that usually this is not really the case in a production environment. Try to make a little test case where you give the query random parameters so different result sets are returned. This will give you a better idea on how fast the query really is and might give you better comparison results. instead of count(*) I isusallt do explain analyze to see how fast PostgreSQL handles to query. Ries -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
No. Ken On Wed, May 06, 2009 at 10:04:33PM +0800, Craig Ringer wrote: Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB, so before I'll say myself it's ok I want to be sure there is nothing else to do.. :-) Can the genetic query optimizer come into play on small queries? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 06, 2009 at 10:04:33PM +0800, Craig Ringer wrote: Can the genetic query optimizer come into play on small queries? On Wed, 6 May 2009, Kenneth Marshall wrote: No. Yes. But you would have had to have set some really weird configuration. Matthew -- And the lexer will say Oh look, there's a null string. Oooh, there's another. And another., and will fall over spectacularly when it realises there are actually rather a lot. - Computer Science Lecturer (edited) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
I supposed in case with prepare and then execute a query optimizer is no more coming in play on execute phase, or did I miss something?.. Forget to say: query cache is disabled on MySQL side. Rgds, -Dimitri On 5/6/09, Craig Ringer cr...@postnewspapers.com.au wrote: Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB, so before I'll say myself it's ok I want to be sure there is nothing else to do.. :-) Can the genetic query optimizer come into play on small queries? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Chris, the only problem I see here is it's 2 times slower vs InnoDB, so before I'll say myself it's ok I want to be sure there is nothing else to do.. :-) Can the genetic query optimizer come into play on small queries? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. Table definitions: create table STAT ( REF CHAR(3)not null, NAMECHAR(40) not null, NUMBINTnot null ); create table HISTORY ( REF_OBJECT CHAR(10) not null, HORDER INT not null, REF_STATCHAR(3) not null, BEGIN_DATE CHAR(12) not null, END_DATECHAR(12) , NOTECHAR(100) ); create unique index stat_ref_idx on STAT( ref ); create index history_ref_idx on HISTORY( ref_object, horder ); NOTE: The same query runs 2 times faster on MySQL. Any idea?.. Rgds, -Dimitri Dimitri, Is there any chance of profiling the postgres backend to see where the time is used? Just an idea, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Hi Ken, yes, I may do it, but I did not expect to come into profiling initially :-) I expected there is just something trivial within a plan that I just don't know.. :-) BTW, is there already an integrated profiled within a code? or do I need external tools?.. Rgds, -Dimitri On 5/6/09, Kenneth Marshall k...@rice.edu wrote: On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. Table definitions: create table STAT ( REF CHAR(3)not null, NAMECHAR(40) not null, NUMBINTnot null ); create table HISTORY ( REF_OBJECT CHAR(10) not null, HORDER INT not null, REF_STATCHAR(3) not null, BEGIN_DATE CHAR(12) not null, END_DATECHAR(12) , NOTECHAR(100) ); create unique index stat_ref_idx on STAT( ref ); create index history_ref_idx on HISTORY( ref_object, horder ); NOTE: The same query runs 2 times faster on MySQL. Any idea?.. Rgds, -Dimitri Dimitri, Is there any chance of profiling the postgres backend to see where the time is used? Just an idea, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, May 06, 2009 at 04:48:21PM +0200, Dimitri wrote: Hi Ken, yes, I may do it, but I did not expect to come into profiling initially :-) I expected there is just something trivial within a plan that I just don't know.. :-) BTW, is there already an integrated profiled within a code? or do I need external tools?.. Rgds, -Dimitri I only suggested it because it might have the effect of changing the sequential scan on the stat table to an indexed scan. Cheers, Ken On 5/6/09, Kenneth Marshall k...@rice.edu wrote: On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote: Hi, any idea if there is a more optimal execution plan possible for this query: select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg, H.END_DATE as hend, H.NOTE as hnote from HISTORY H, STAT S where S.REF = H.REF_STAT and H.REF_OBJECT = '01' order by H.HORDER ; EXPLAIN ANALYZE output on 8.4: QUERY PLAN Sort (cost=4549.75..4555.76 rows=2404 width=176) (actual time=1.341..1.343 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Hash Join (cost=33.50..4414.75 rows=2404 width=176) (actual time=1.200..1.232 rows=20 loops=1) Hash Cond: (h.ref_stat = s.ref) - Index Scan using history_ref_idx on history h (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) - Hash (cost=21.00..21.00 rows=1000 width=45) (actual time=1.147..1.147 rows=1000 loops=1) - Seq Scan on stat s (cost=0.00..21.00 rows=1000 width=45) (actual time=0.005..0.325 rows=1000 loops=1) Total runtime: 1.442 ms (10 rows) Table HISTORY contains 200M rows, only 20 needed Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values. Table definitions: create table STAT ( REF CHAR(3)not null, NAMECHAR(40) not null, NUMBINTnot null ); create table HISTORY ( REF_OBJECT CHAR(10) not null, HORDER INT not null, REF_STATCHAR(3) not null, BEGIN_DATE CHAR(12) not null, END_DATECHAR(12) , NOTECHAR(100) ); create unique index stat_ref_idx on STAT( ref ); create index history_ref_idx on HISTORY( ref_object, horder ); NOTE: The same query runs 2 times faster on MySQL. Any idea?.. Rgds, -Dimitri Dimitri, Is there any chance of profiling the postgres backend to see where the time is used? Just an idea, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote: The problem has been finding someone who has both the time and the ability to do the work. Unfortunately there has been significant debate over which parts of partitioning need to be improved. My own view is that considerable attention needs to be applied to both the executor and planner to improve matters and that syntax improvements are largely irrelevant, though seductive. Deep improvements will require significant analysis, agreement, effort and skill. What we have now took approximately 20 days to implement, with later patches adding about another 10-20 days work. I'd estimate the required work as 60-100 days work from primary author, plus planning and discussion time. YMMV. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote: I've already tried a target 1000 and the only thing it changes comparing to the current 100 (default) is instead of 2404 rows it says 240 rows, but the plan remaining the same.. Try both of these things * REINDEX on the index being used in the query, then re-EXPLAIN * enable_hashjoin = off, then re-EXPLAIN You should first attempt to get the same plan, then confirm it really is faster before we worry why the optimizer hadn't picked that plan. We already know that MySQL favors nested loop joins, so turning up a plan that on this occasion is actually better that way is in no way representative of general performance. Does MySQL support hash joins? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote: The problem has been finding someone who has both the time and the ability to do the work. Unfortunately there has been significant debate over which parts of partitioning need to be improved. My own view is that considerable attention needs to be applied to both the executor and planner to improve matters and that syntax improvements are largely irrelevant, though seductive. My thought about it is that what we really need is an explicit notion of partitioned tables built into the system, instead of trying to make the planner re-deduce the partitioning behavior from first principles every time it builds a plan for such a table. Such a notion would presumably involve some new syntax to allow the partitioning rule to be specified at table creation time. I agree that the syntax details are a minor issue, but the set of possible partitioning rules is certainly a topic of great interest. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
Matthew Wakeling matt...@flymine.org writes: Here is my patch ported over to the seg contrib package, attached. Apply it to seg.c and all should be well. A similar thing needs to be done to cube, but I haven't looked at that. Teodor, Oleg, do you intend to review/apply this patch? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
On Wed, 2009-05-06 at 17:55 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote: The problem has been finding someone who has both the time and the ability to do the work. Unfortunately there has been significant debate over which parts of partitioning need to be improved. My own view is that considerable attention needs to be applied to both the executor and planner to improve matters and that syntax improvements are largely irrelevant, though seductive. My thought about it is that what we really need is an explicit notion of partitioned tables built into the system, instead of trying to make the planner re-deduce the partitioning behavior from first principles every time it builds a plan for such a table. Such a notion would presumably involve some new syntax to allow the partitioning rule to be specified at table creation time. I agree that the syntax details are a minor issue, but the set of possible partitioning rules is certainly a topic of great interest. Agreed. Perhaps I should say then that the syntax needs to express the requirements of the planner/executor behaviour, rather than being the main aspect of the feature, as some have suggested. Hopefully, notions of partitioning won't be directly tied to chunking of data for parallel query access. Most queries access recent data and hence only a single partition (or stripe), so partitioning and parallelism and frequently exactly orthogonal. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
Alvaro Herrera alvhe...@commandprompt.com writes: I think there should be a way to refer to individual partitions as objects. Yeah, the individual partitions should be nameable tables, otherwise we will be reinventing a *whole* lot of management stuff to little gain. I don't actually think there is anything wrong with using table inheritance as the basic infrastructure --- I just want more smarts about one particular use pattern of inheritance. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Transparent table partitioning in future version of PG?
Simon Riggs escribió: Hopefully, notions of partitioning won't be directly tied to chunking of data for parallel query access. Most queries access recent data and hence only a single partition (or stripe), so partitioning and parallelism and frequently exactly orthogonal. I think there should be a way to refer to individual partitions as objects. That way we could execute some commands to enable certain optimizations, for example mark this partition read only which would mean it could be marked as not needing vacuum. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance