[PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi,

any idea if there is a more optimal execution plan possible for this query:

select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
H.END_DATE as hend, H.NOTE as hnote
 from HISTORY H, STAT S
 where S.REF = H.REF_STAT
 and H.REF_OBJECT = '01'
 order by H.HORDER ;

EXPLAIN ANALYZE output on 8.4:
   QUERY PLAN

 Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
time=1.341..1.343 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   -  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
time=1.200..1.232 rows=20 loops=1)
 Hash Cond: (h.ref_stat = s.ref)
 -  Index Scan using history_ref_idx on history h
(cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
rows=20 loops=1)
   Index Cond: (ref_object = '01'::bpchar)
 -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
time=1.147..1.147 rows=1000 loops=1)
   -  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
width=45) (actual time=0.005..0.325 rows=1000 loops=1)
 Total runtime: 1.442 ms
(10 rows)

Table HISTORY contains 200M rows, only 20 needed
Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values.

Table definitions:

create table STAT
(
REF CHAR(3)not null,
NAMECHAR(40)   not null,
NUMBINTnot null
);

create table HISTORY
(
REF_OBJECT  CHAR(10)  not null,
HORDER  INT   not null,
REF_STATCHAR(3)   not null,
BEGIN_DATE  CHAR(12)  not null,
END_DATECHAR(12)  ,
NOTECHAR(100)
);

create unique index stat_ref_idx on STAT( ref );
create index history_ref_idx on HISTORY( ref_object, horder );


NOTE: The same query runs 2 times faster on MySQL.

Any idea?..

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Craig Ringer

Dimitri wrote:

Hi,

any idea if there is a more optimal execution plan possible for this query:

select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
H.END_DATE as hend, H.NOTE as hnote
 from HISTORY H, STAT S
 where S.REF = H.REF_STAT
 and H.REF_OBJECT = '01'
 order by H.HORDER ;


OK, so you're taking a simple:

   history INNER JOIN stat ON (stat.ref = history.ref_stat)

then filtering for records with a particular value of history.ref_object 
and finally performing a sort.


If I'm reading it right, the plan below does a sequential scan on the 
`stat' table. The stat table only has 1000 rows, so this isn't 
necessarily an unreasonable choice even if there is an appropriate index 
and even if not many of the rows will be needed.


It then does an index scan of the history table looking for tuples with 
ref_object = '01' (text match). It hash joins the hashed results 
of the initial seq scan to the results of the index scan, and sorts the 
result.


To me, that looks pretty reasonable. You might be able to avoid the hash 
join in favour of a nested loop scan of stat_ref_idx (looping over 
records from history.ref_stat where ref_object = '001') by 
providing a composite index on HISTORY(ref_stat, ref_object). I'm really 
not too sure, though; plan optimization isn't my thing, I'm just seeing 
if I can offer a few ideas.



Table definitions:


While not strictly necessary, it's a *REALLY* good idea to define a 
suitable PRIMARY KEY.


Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for 
bounded-length values, or `text' for unbounded fields, unless you REALLY 
want the crazy behaviour of `CHAR(n)'.


I'm a little bit puzzled about why you seem to be doing lots of things 
with integer values stored in text strings, but that probably doesn't 
matter too much for the issue at hand.



NOTE: The same query runs 2 times faster on MySQL.


With InnoDB tables and proper transactional safety? Or using scary 
MyISAM tables and a just pray approach to data integrity? If you're 
using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly 
fast, but oh-my-god dangerous.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi Craig,

yes, you detailed very well the problem! :-)
all those CHAR columns are so just due historical issues :-) as well
they may contains anything else and not only numbers, that's why..
Also, all data inside are fixed, so VARCHAR will not save place, or
what kind of performance issue may we expect with CHAR vs VARCHAR if
all data have a fixed length?..

Any way to force nested loop without additional index?..

It's 2 times faster on InnoDB, and as it's just a SELECT query no need
to go in transaction details :-)

Rgds,
-Dimitri

On 5/6/09, Craig Ringer cr...@postnewspapers.com.au wrote:
 Dimitri wrote:
 Hi,

 any idea if there is a more optimal execution plan possible for this
 query:

 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
 hbeg,
 H.END_DATE as hend, H.NOTE as hnote
  from HISTORY H, STAT S
  where S.REF = H.REF_STAT
  and H.REF_OBJECT = '01'
  order by H.HORDER ;

 OK, so you're taking a simple:

 history INNER JOIN stat ON (stat.ref = history.ref_stat)

 then filtering for records with a particular value of history.ref_object
 and finally performing a sort.

 If I'm reading it right, the plan below does a sequential scan on the
 `stat' table. The stat table only has 1000 rows, so this isn't
 necessarily an unreasonable choice even if there is an appropriate index
 and even if not many of the rows will be needed.

 It then does an index scan of the history table looking for tuples with
 ref_object = '01' (text match). It hash joins the hashed results
 of the initial seq scan to the results of the index scan, and sorts the
 result.

 To me, that looks pretty reasonable. You might be able to avoid the hash
 join in favour of a nested loop scan of stat_ref_idx (looping over
 records from history.ref_stat where ref_object = '001') by
 providing a composite index on HISTORY(ref_stat, ref_object). I'm really
 not too sure, though; plan optimization isn't my thing, I'm just seeing
 if I can offer a few ideas.

 Table definitions:

 While not strictly necessary, it's a *REALLY* good idea to define a
 suitable PRIMARY KEY.

 Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for
 bounded-length values, or `text' for unbounded fields, unless you REALLY
 want the crazy behaviour of `CHAR(n)'.

 I'm a little bit puzzled about why you seem to be doing lots of things
 with integer values stored in text strings, but that probably doesn't
 matter too much for the issue at hand.

 NOTE: The same query runs 2 times faster on MySQL.

 With InnoDB tables and proper transactional safety? Or using scary
 MyISAM tables and a just pray approach to data integrity? If you're
 using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly
 fast, but oh-my-god dangerous.

 --
 Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Heikki Linnakangas

Dimitri wrote:

any idea if there is a more optimal execution plan possible for this query:

select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
H.END_DATE as hend, H.NOTE as hnote
 from HISTORY H, STAT S
 where S.REF = H.REF_STAT
 and H.REF_OBJECT = '01'
 order by H.HORDER ;

EXPLAIN ANALYZE output on 8.4:
   QUERY PLAN

 Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
time=1.341..1.343 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   -  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
time=1.200..1.232 rows=20 loops=1)
 Hash Cond: (h.ref_stat = s.ref)
 -  Index Scan using history_ref_idx on history h
(cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
rows=20 loops=1)
   Index Cond: (ref_object = '01'::bpchar)
 -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
time=1.147..1.147 rows=1000 loops=1)
   -  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
width=45) (actual time=0.005..0.325 rows=1000 loops=1)
 Total runtime: 1.442 ms
(10 rows)

Table HISTORY contains 200M rows, only 20 needed
Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values.


The bad doesn't look too bad to me, although the planner is 
over-estimating the number of matches in the history table (2404 vs 20). 
That's a bit surprising given how simple the predicate is. Make sure 
you've ANALYZEd the table. If that's not enough, you can try to increase 
the statistics target for ref_object column, ie. ALTER TABLE history 
ALTER COLUMN ref_object SET STATISTICS 500. That might give you a 
different plan, maybe with a nested loop join instead of hash join, 
which might be faster in this case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Chris

Dimitri wrote:

Hi Craig,

yes, you detailed very well the problem! :-)
all those CHAR columns are so just due historical issues :-) as well
they may contains anything else and not only numbers, that's why..
Also, all data inside are fixed, so VARCHAR will not save place, or
what kind of performance issue may we expect with CHAR vs VARCHAR if
all data have a fixed length?..


None in postgres, but the char/varchar thing may or may not bite you at 
some point later - sounds like you have it covered though.



It's 2 times faster on InnoDB, and as it's just a SELECT query no need
to go in transaction details :-)


 Total runtime: 1.442 ms
(10 rows)

You posted a query that's taking 2/1000's of a second. I don't really 
see a performance problem here :)


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi Heikki,

I've already tried a target 1000 and the only thing it changes
comparing to the current 100 (default) is instead of 2404 rows it says
240 rows, but the plan remaining the same..

Rgds,
-Dimitri

On 5/6/09, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 Dimitri wrote:
 any idea if there is a more optimal execution plan possible for this
 query:

 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
 hbeg,
 H.END_DATE as hend, H.NOTE as hnote
  from HISTORY H, STAT S
  where S.REF = H.REF_STAT
  and H.REF_OBJECT = '01'
  order by H.HORDER ;

 EXPLAIN ANALYZE output on 8.4:
QUERY
 PLAN
 
  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
 time=1.341..1.343 rows=20 loops=1)
Sort Key: h.horder
Sort Method:  quicksort  Memory: 30kB
-  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
 time=1.200..1.232 rows=20 loops=1)
  Hash Cond: (h.ref_stat = s.ref)
  -  Index Scan using history_ref_idx on history h
 (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
 rows=20 loops=1)
Index Cond: (ref_object = '01'::bpchar)
  -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
 time=1.147..1.147 rows=1000 loops=1)
-  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
 width=45) (actual time=0.005..0.325 rows=1000 loops=1)
  Total runtime: 1.442 ms
 (10 rows)

 Table HISTORY contains 200M rows, only 20 needed
 Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY
 values.

 The bad doesn't look too bad to me, although the planner is
 over-estimating the number of matches in the history table (2404 vs 20).
 That's a bit surprising given how simple the predicate is. Make sure
 you've ANALYZEd the table. If that's not enough, you can try to increase
 the statistics target for ref_object column, ie. ALTER TABLE history
 ALTER COLUMN ref_object SET STATISTICS 500. That might give you a
 different plan, maybe with a nested loop join instead of hash join,
 which might be faster in this case.

 --
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi Chris,

the only problem I see here is it's 2 times slower vs InnoDB, so
before I'll say myself it's ok I want to be sure there is nothing else
to do.. :-)

Rgds,
-Dimitri


On 5/6/09, Chris dmag...@gmail.com wrote:
 Dimitri wrote:
 Hi Craig,

 yes, you detailed very well the problem! :-)
 all those CHAR columns are so just due historical issues :-) as well
 they may contains anything else and not only numbers, that's why..
 Also, all data inside are fixed, so VARCHAR will not save place, or
 what kind of performance issue may we expect with CHAR vs VARCHAR if
 all data have a fixed length?..

 None in postgres, but the char/varchar thing may or may not bite you at
 some point later - sounds like you have it covered though.

 It's 2 times faster on InnoDB, and as it's just a SELECT query no need
 to go in transaction details :-)

   Total runtime: 1.442 ms
 (10 rows)

 You posted a query that's taking 2/1000's of a second. I don't really
 see a performance problem here :)

 --
 Postgresql  php tutorials
 http://www.designmagick.com/



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi Richard,

no, of course it's not based on explain :-)
I've run several tests before and now going in depth to understand if
there is nothing wrong. Due such a single query time difference InnoDB
is doing 2-3 times better TPS level comparing to PostgreSQL..

Rgds,
-Dimitri


On 5/6/09, Richard Huxton d...@archonet.com wrote:
 Dimitri wrote:
 Hi Chris,

 the only problem I see here is it's 2 times slower vs InnoDB

 How do you know? This isn't just based on the explain values reported,
 is it?

 --
Richard Huxton
Archonet Ltd


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Richard Huxton

Dimitri wrote:

Hi Chris,

the only problem I see here is it's 2 times slower vs InnoDB


How do you know? This isn't just based on the explain values reported, 
is it?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Richard Huxton

Dimitri wrote:

Hi Richard,

no, of course it's not based on explain :-)
I've run several tests before and now going in depth to understand if
there is nothing wrong. Due such a single query time difference InnoDB
is doing 2-3 times better TPS level comparing to PostgreSQL..


And you are satisfied that it is the planned query time that is the 
dominant factor here, and not parsing time, connection time, data 
transport, disk bandwidth etc?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Merlin Moncure
On Wed, May 6, 2009 at 3:38 AM, Dimitri dimitrik...@gmail.com wrote:
 Hi,

 any idea if there is a more optimal execution plan possible for this query:

 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
        H.END_DATE as hend, H.NOTE as hnote
         from HISTORY H, STAT S
         where S.REF = H.REF_STAT
         and H.REF_OBJECT = '01'
         order by H.HORDER ;

 EXPLAIN ANALYZE output on 8.4:
                                                                   QUERY PLAN
 
  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
 time=1.341..1.343 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   -  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
 time=1.200..1.232 rows=20 loops=1)
         Hash Cond: (h.ref_stat = s.ref)
         -  Index Scan using history_ref_idx on history h
 (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
 rows=20 loops=1)
               Index Cond: (ref_object = '01'::bpchar)
         -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
 time=1.147..1.147 rows=1000 loops=1)
               -  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
 width=45) (actual time=0.005..0.325 rows=1000 loops=1)
  Total runtime: 1.442 ms
 (10 rows)

 Table HISTORY contains 200M rows, only 20 needed
 Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values.

 Table definitions:
 
 create table STAT
 (
    REF                 CHAR(3)            not null,
    NAME                CHAR(40)           not null,
    NUMB                INT                not null
 );

 create table HISTORY
 (
    REF_OBJECT          CHAR(10)              not null,
    HORDER              INT                   not null,
    REF_STAT            CHAR(3)               not null,
    BEGIN_DATE          CHAR(12)              not null,
    END_DATE            CHAR(12)                      ,
    NOTE                CHAR(100)
 );

 create unique index stat_ref_idx on STAT( ref );
 create index history_ref_idx on HISTORY( ref_object, horder );
 

 NOTE: The same query runs 2 times faster on MySQL.

couple of things to try:
*) as others have noted, get rid of char() columns. use varchar, or
int if you can.  this is a bigger deal in postgres than mysql.
*) curious if disabling sequential scan helps (set enable_seqscan =
false) or changes the plan.  .3 msec is spent on seq scan and an index
lookup is likely much faster.
*) prepare the query:

prepare history_stat(char(10) as
  select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
   H.END_DATE as hend, H.NOTE as hnote
from HISTORY H, STAT S
where S.REF = H.REF_STAT
and H.REF_OBJECT = $1
order by H.HORDER ;

execute history_stat('01');

(prepared queries have some annoyances you  need to be prepared to
deal with. however, they are quite useful when squeezing every last
msec out of fast queries).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Merlin Moncure
On Wed, May 6, 2009 at 7:46 AM, Merlin Moncure mmonc...@gmail.com wrote:
 prepare history_stat(char(10) as

typo:
prepare history_stat(char(10)) as

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Albe Laurenz
Dimitri wrote:
 I've run several tests before and now going in depth to understand if
 there is nothing wrong. Due such a single query time difference InnoDB
 is doing 2-3 times better TPS level comparing to PostgreSQL..

Why don't you use MySQL then?
Or tune PostgreSQL?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
I'll try to answer all mails at once :-))

- query is running fully in RAM, no I/O, no network, only CPU time

- looping 100 times the same query gives 132ms total time (~1.32ms per
query), while it's 44ms on InnoDB (~0.44ms per query)

- disabling seq scan forcing a planner to use an index scan, and
finally it worse as gives 1.53ms per query..

- prepare the query helps: prepare statement takes 16ms, but execute
runs in 0.98ms  = which make me think it's not only a planner
overhead... And it's still 2 times lower vs 0.44ms.
Also, generally prepare cannot be used in this test case as we suppose
any query may be of any kind (even if it's not always true :-))

- char or varchar should be used here because the reference code is
supposed to accept any characters (alphanumeric)

- it also reminds me that probably there are some extra CPU time due
locale setting - but all my lc_* variables are set to C...

Rgds,
-Dimitri


On 5/6/09, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, May 6, 2009 at 7:46 AM, Merlin Moncure mmonc...@gmail.com wrote:
 prepare history_stat(char(10) as

 typo:
 prepare history_stat(char(10)) as


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
The story is simple: for the launching of MySQL 5.4 I've done a
testing comparing available on that time variations of InnoDB engines,
and at the end by curiosity started the same test with PostgreSQL
8.3.7 to see if MySQL performance level is more close to PostgreSQL
now (PG was a strong true winner before). For my big surprise MySQL
5.4 outpassed 8.3.7...
However, analyzing the PostgreSQL processing I got a feeling something
goes wrong on PG side.. So, now I've installed both 8.3.7 and 8.4beta1
to see more in depth what's going on. Currently 8.4 performs much
better than 8.3.7, but there is still a room for improvement if such a
small query may go faster :-)

Rgds,
-Dimitri

On 5/6/09, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Dimitri wrote:
 I've run several tests before and now going in depth to understand if
 there is nothing wrong. Due such a single query time difference InnoDB
 is doing 2-3 times better TPS level comparing to PostgreSQL..

 Why don't you use MySQL then?
 Or tune PostgreSQL?

 Yours,
 Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Richard Huxton

Dimitri wrote:

I'll try to answer all mails at once :-))

- query is running fully in RAM, no I/O, no network, only CPU time

- looping 100 times the same query gives 132ms total time (~1.32ms per
query), while it's 44ms on InnoDB (~0.44ms per query)


Well, assuming you're happy that PG is tuned reasonably for your machine 
and that MySQL's query cache isn't returning the results here it looks 
like MySQL is faster for this particular query.


The only obvious place there could be a big gain is with the hashing 
algorithm. If you remove the ORDER BY and the query-time doesn't fall by 
much then it's the hash phase.


The other thing to try is to alter the query to be a SELECT count(*) 
rather than returning rows - that will let you measure the time to 
transfer the result rows.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Kenneth Marshall
On Wed, May 06, 2009 at 04:01:03PM +0800, Craig Ringer wrote:
 Dimitri wrote:
 Hi,
 any idea if there is a more optimal execution plan possible for this 
 query:
 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as 
 hbeg,
 H.END_DATE as hend, H.NOTE as hnote
  from HISTORY H, STAT S
  where S.REF = H.REF_STAT
  and H.REF_OBJECT = '01'
  order by H.HORDER ;

 OK, so you're taking a simple:

history INNER JOIN stat ON (stat.ref = history.ref_stat)

 then filtering for records with a particular value of history.ref_object 
 and finally performing a sort.

 If I'm reading it right, the plan below does a sequential scan on the 
 `stat' table. The stat table only has 1000 rows, so this isn't necessarily 
 an unreasonable choice even if there is an appropriate index and even if 
 not many of the rows will be needed.

 It then does an index scan of the history table looking for tuples with 
 ref_object = '01' (text match). It hash joins the hashed results of 
 the initial seq scan to the results of the index scan, and sorts the 
 result.

 To me, that looks pretty reasonable. You might be able to avoid the hash 
 join in favour of a nested loop scan of stat_ref_idx (looping over records 
 from history.ref_stat where ref_object = '001') by providing a 
 composite index on HISTORY(ref_stat, ref_object). I'm really not too sure, 
 though; plan optimization isn't my thing, I'm just seeing if I can offer a 
 few ideas.

 Table definitions:

 While not strictly necessary, it's a *REALLY* good idea to define a 
 suitable PRIMARY KEY.

 Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for 
 bounded-length values, or `text' for unbounded fields, unless you REALLY 
 want the crazy behaviour of `CHAR(n)'.

 I'm a little bit puzzled about why you seem to be doing lots of things with 
 integer values stored in text strings, but that probably doesn't matter too 
 much for the issue at hand.

 NOTE: The same query runs 2 times faster on MySQL.

 With InnoDB tables and proper transactional safety? Or using scary MyISAM 
 tables and a just pray approach to data integrity? If you're using MyISAM 
 tables I'm not surprised; MySQL with MyISAM is stunningly fast, but 
 oh-my-god dangerous.

 --
 Craig Ringer

I just thought I would ask. Are you using the query cache in MySQL?
If that is on, that could be the difference. Another thing to check,
try issuing the selects concurrently: 2 at a time, 5 at a time, 10
at a time... and see if that has an effect on timing. In many of the
benchmarks, MySQL will out perform PostgreSQL for very low numbers of
clients. Once you are using more than a handful, PostgreSQL pulls
ahead. Also, is this a completely static table? i.e. no updates or
inserts. How is the performance with those happening? This should
help you get a clearer picture of the performance.

My two cents.
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Kenneth Marshall
On Wed, May 06, 2009 at 02:49:23PM +0200, Dimitri wrote:
 The story is simple: for the launching of MySQL 5.4 I've done a
 testing comparing available on that time variations of InnoDB engines,
 and at the end by curiosity started the same test with PostgreSQL
 8.3.7 to see if MySQL performance level is more close to PostgreSQL
 now (PG was a strong true winner before). For my big surprise MySQL
 5.4 outpassed 8.3.7...
 However, analyzing the PostgreSQL processing I got a feeling something
 goes wrong on PG side.. So, now I've installed both 8.3.7 and 8.4beta1
 to see more in depth what's going on. Currently 8.4 performs much
 better than 8.3.7, but there is still a room for improvement if such a
 small query may go faster :-)
 
 Rgds,
 -Dimitri
 
 On 5/6/09, Albe Laurenz laurenz.a...@wien.gv.at wrote:
  Dimitri wrote:
  I've run several tests before and now going in depth to understand if
  there is nothing wrong. Due such a single query time difference InnoDB
  is doing 2-3 times better TPS level comparing to PostgreSQL..
 
  Why don't you use MySQL then?
  Or tune PostgreSQL?
 
  Yours,
  Laurenz Albe
 

Another thought, have you tuned PostgreSQL for an in memory database?
Those tuning options may be what is needed to improve the plan chosen
by PostgreSQL.

Cheers,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Ries van Twisk


On May 6, 2009, at 7:53 AM, Richard Huxton wrote:


Dimitri wrote:

I'll try to answer all mails at once :-))
- query is running fully in RAM, no I/O, no network, only CPU time
- looping 100 times the same query gives 132ms total time (~1.32ms  
per

query), while it's 44ms on InnoDB (~0.44ms per query)


Well, assuming you're happy that PG is tuned reasonably for your  
machine and that MySQL's query cache isn't returning the results  
here it looks like MySQL is faster for this particular query.


The only obvious place there could be a big gain is with the hashing  
algorithm. If you remove the ORDER BY and the query-time doesn't  
fall by much then it's the hash phase.


The other thing to try is to alter the query to be a SELECT count(*)  
rather than returning rows - that will let you measure the time to  
transfer the result rows.


--
 Richard Huxton
 Archonet Ltd




Do you expect to run this query 100 times per second during your  
application?

or is this just a test to see how fast the query is for optimalisation.

I always get scared myself with such a test as 'runs out of memory',  
reason
given is that usually this is not really the case in a production  
environment.


Try to make a little test case where you give the query random  
parameters
so different result sets are returned. This will give you a better  
idea on how

fast the query really is and might give you better comparison results.

instead of count(*) I isusallt do explain analyze to see how fast   
PostgreSQL handles to query.


Ries



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Folks, first of all:

  - I used a fixed reference value just to simplify the case analyzing
and isolate it as max as possible, of course during my tests all
values are random :-)

- final goal of the test is to analyze scalability, so yes, concurrent
sessions with random keys are growing from 1 to 256  (I run it on
32cores server, no think time, just stressing), and the result is
still not yet better comparing to InnoDB

- I'm analyzing this query running in memory to understand what's
blocking while all main bottlenecks are avoided (no I/O anymore nor
network, etc.)

- initial explain analyze and table details were posted in the first message


Now, let's go more further:

 -  so as it query execution took 1.50ms

 - after removing order by it took 1.19ms

 - select count(*)  instead of columns and with removed order by took 0.98ms

- execute of the same prepared select count(*) ... took 0.68ms

So, where the time is going?...

Rgds,
-Dimitri


On 5/6/09, Ries van Twisk p...@rvt.dds.nl wrote:

 On May 6, 2009, at 7:53 AM, Richard Huxton wrote:

 Dimitri wrote:
 I'll try to answer all mails at once :-))
 - query is running fully in RAM, no I/O, no network, only CPU time
 - looping 100 times the same query gives 132ms total time (~1.32ms
 per
 query), while it's 44ms on InnoDB (~0.44ms per query)

 Well, assuming you're happy that PG is tuned reasonably for your
 machine and that MySQL's query cache isn't returning the results
 here it looks like MySQL is faster for this particular query.

 The only obvious place there could be a big gain is with the hashing
 algorithm. If you remove the ORDER BY and the query-time doesn't
 fall by much then it's the hash phase.

 The other thing to try is to alter the query to be a SELECT count(*)
 rather than returning rows - that will let you measure the time to
 transfer the result rows.

 --
  Richard Huxton
  Archonet Ltd



 Do you expect to run this query 100 times per second during your
 application?
 or is this just a test to see how fast the query is for optimalisation.

 I always get scared myself with such a test as 'runs out of memory',
 reason
 given is that usually this is not really the case in a production
 environment.

 Try to make a little test case where you give the query random
 parameters
 so different result sets are returned. This will give you a better
 idea on how
 fast the query really is and might give you better comparison results.

 instead of count(*) I isusallt do explain analyze to see how fast
 PostgreSQL handles to query.

 Ries




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Kenneth Marshall
No.

Ken
On Wed, May 06, 2009 at 10:04:33PM +0800, Craig Ringer wrote:
 Dimitri wrote:
  Hi Chris,
  
  the only problem I see here is it's 2 times slower vs InnoDB, so
  before I'll say myself it's ok I want to be sure there is nothing else
  to do.. :-)
 
 Can the genetic query optimizer come into play on small queries?
 
 --
 Craig Ringer
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Matthew Wakeling

On Wed, May 06, 2009 at 10:04:33PM +0800, Craig Ringer wrote:

Can the genetic query optimizer come into play on small queries?


On Wed, 6 May 2009, Kenneth Marshall wrote:

No.


Yes. But you would have had to have set some really weird configuration.

Matthew

--
And the lexer will say Oh look, there's a null string. Oooh, there's 
another. And another., and will fall over spectacularly when it realises

there are actually rather a lot.
- Computer Science Lecturer (edited)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
I supposed in case with prepare and then execute a query optimizer is
no more coming in play on execute phase, or did I miss something?..

Forget to say: query cache is disabled on MySQL side.

Rgds,
-Dimitri

On 5/6/09, Craig Ringer cr...@postnewspapers.com.au wrote:
 Dimitri wrote:
 Hi Chris,

 the only problem I see here is it's 2 times slower vs InnoDB, so
 before I'll say myself it's ok I want to be sure there is nothing else
 to do.. :-)

 Can the genetic query optimizer come into play on small queries?

 --
 Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Craig Ringer
Dimitri wrote:
 Hi Chris,
 
 the only problem I see here is it's 2 times slower vs InnoDB, so
 before I'll say myself it's ok I want to be sure there is nothing else
 to do.. :-)

Can the genetic query optimizer come into play on small queries?

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Kenneth Marshall
On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote:
 Hi,
 
 any idea if there is a more optimal execution plan possible for this query:
 
 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
 H.END_DATE as hend, H.NOTE as hnote
  from HISTORY H, STAT S
  where S.REF = H.REF_STAT
  and H.REF_OBJECT = '01'
  order by H.HORDER ;
 
 EXPLAIN ANALYZE output on 8.4:
QUERY PLAN
 
  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
 time=1.341..1.343 rows=20 loops=1)
Sort Key: h.horder
Sort Method:  quicksort  Memory: 30kB
-  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
 time=1.200..1.232 rows=20 loops=1)
  Hash Cond: (h.ref_stat = s.ref)
  -  Index Scan using history_ref_idx on history h
 (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
 rows=20 loops=1)
Index Cond: (ref_object = '01'::bpchar)
  -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
 time=1.147..1.147 rows=1000 loops=1)
-  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
 width=45) (actual time=0.005..0.325 rows=1000 loops=1)
  Total runtime: 1.442 ms
 (10 rows)
 
 Table HISTORY contains 200M rows, only 20 needed
 Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values.
 
 Table definitions:
 
 create table STAT
 (
 REF CHAR(3)not null,
 NAMECHAR(40)   not null,
 NUMBINTnot null
 );
 
 create table HISTORY
 (
 REF_OBJECT  CHAR(10)  not null,
 HORDER  INT   not null,
 REF_STATCHAR(3)   not null,
 BEGIN_DATE  CHAR(12)  not null,
 END_DATECHAR(12)  ,
 NOTECHAR(100)
 );
 
 create unique index stat_ref_idx on STAT( ref );
 create index history_ref_idx on HISTORY( ref_object, horder );
 
 
 NOTE: The same query runs 2 times faster on MySQL.
 
 Any idea?..
 
 Rgds,
 -Dimitri
 
Dimitri,

Is there any chance of profiling the postgres backend to see
where the time is used?

Just an idea,
Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri
Hi Ken,

yes, I may do it, but I did not expect to come into profiling initially :-)
I expected there is just something trivial within a plan that I just
don't know.. :-)

BTW, is there already an integrated profiled within a code? or do I
need external tools?..

Rgds,
-Dimitri

On 5/6/09, Kenneth Marshall k...@rice.edu wrote:
 On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote:
 Hi,

 any idea if there is a more optimal execution plan possible for this
 query:

 select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
 hbeg,
 H.END_DATE as hend, H.NOTE as hnote
  from HISTORY H, STAT S
  where S.REF = H.REF_STAT
  and H.REF_OBJECT = '01'
  order by H.HORDER ;

 EXPLAIN ANALYZE output on 8.4:
QUERY
 PLAN
 
  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
 time=1.341..1.343 rows=20 loops=1)
Sort Key: h.horder
Sort Method:  quicksort  Memory: 30kB
-  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
 time=1.200..1.232 rows=20 loops=1)
  Hash Cond: (h.ref_stat = s.ref)
  -  Index Scan using history_ref_idx on history h
 (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
 rows=20 loops=1)
Index Cond: (ref_object = '01'::bpchar)
  -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
 time=1.147..1.147 rows=1000 loops=1)
-  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
 width=45) (actual time=0.005..0.325 rows=1000 loops=1)
  Total runtime: 1.442 ms
 (10 rows)

 Table HISTORY contains 200M rows, only 20 needed
 Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY
 values.

 Table definitions:
 
 create table STAT
 (
 REF CHAR(3)not null,
 NAMECHAR(40)   not null,
 NUMBINTnot null
 );

 create table HISTORY
 (
 REF_OBJECT  CHAR(10)  not null,
 HORDER  INT   not null,
 REF_STATCHAR(3)   not null,
 BEGIN_DATE  CHAR(12)  not null,
 END_DATECHAR(12)  ,
 NOTECHAR(100)
 );

 create unique index stat_ref_idx on STAT( ref );
 create index history_ref_idx on HISTORY( ref_object, horder );
 

 NOTE: The same query runs 2 times faster on MySQL.

 Any idea?..

 Rgds,
 -Dimitri

 Dimitri,

 Is there any chance of profiling the postgres backend to see
 where the time is used?

 Just an idea,
 Ken


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Kenneth Marshall
On Wed, May 06, 2009 at 04:48:21PM +0200, Dimitri wrote:
 Hi Ken,
 
 yes, I may do it, but I did not expect to come into profiling initially :-)
 I expected there is just something trivial within a plan that I just
 don't know.. :-)
 
 BTW, is there already an integrated profiled within a code? or do I
 need external tools?..
 
 Rgds,
 -Dimitri

I only suggested it because it might have the effect of changing
the sequential scan on the stat table to an indexed scan.

Cheers,
Ken
 
 On 5/6/09, Kenneth Marshall k...@rice.edu wrote:
  On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote:
  Hi,
 
  any idea if there is a more optimal execution plan possible for this
  query:
 
  select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
  hbeg,
  H.END_DATE as hend, H.NOTE as hnote
   from HISTORY H, STAT S
   where S.REF = H.REF_STAT
   and H.REF_OBJECT = '01'
   order by H.HORDER ;
 
  EXPLAIN ANALYZE output on 8.4:
 QUERY
  PLAN
  
   Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
  time=1.341..1.343 rows=20 loops=1)
 Sort Key: h.horder
 Sort Method:  quicksort  Memory: 30kB
 -  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
  time=1.200..1.232 rows=20 loops=1)
   Hash Cond: (h.ref_stat = s.ref)
   -  Index Scan using history_ref_idx on history h
  (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
  rows=20 loops=1)
 Index Cond: (ref_object = '01'::bpchar)
   -  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
  time=1.147..1.147 rows=1000 loops=1)
 -  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
  width=45) (actual time=0.005..0.325 rows=1000 loops=1)
   Total runtime: 1.442 ms
  (10 rows)
 
  Table HISTORY contains 200M rows, only 20 needed
  Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY
  values.
 
  Table definitions:
  
  create table STAT
  (
  REF CHAR(3)not null,
  NAMECHAR(40)   not null,
  NUMBINTnot null
  );
 
  create table HISTORY
  (
  REF_OBJECT  CHAR(10)  not null,
  HORDER  INT   not null,
  REF_STATCHAR(3)   not null,
  BEGIN_DATE  CHAR(12)  not null,
  END_DATECHAR(12)  ,
  NOTECHAR(100)
  );
 
  create unique index stat_ref_idx on STAT( ref );
  create index history_ref_idx on HISTORY( ref_object, horder );
  
 
  NOTE: The same query runs 2 times faster on MySQL.
 
  Any idea?..
 
  Rgds,
  -Dimitri
 
  Dimitri,
 
  Is there any chance of profiling the postgres backend to see
  where the time is used?
 
  Just an idea,
  Ken
 
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:

 The problem has been finding someone who has both the time and the
 ability to do the work.

Unfortunately there has been significant debate over which parts of
partitioning need to be improved. My own view is that considerable
attention needs to be applied to both the executor and planner to
improve matters and that syntax improvements are largely irrelevant,
though seductive.

Deep improvements will require significant analysis, agreement, effort
and skill. What we have now took approximately 20 days to implement,
with later patches adding about another 10-20 days work. I'd estimate
the required work as 60-100 days work from primary author, plus planning
and discussion time. YMMV.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote:

 I've already tried a target 1000 and the only thing it changes
 comparing to the current 100 (default) is instead of 2404 rows it says
 240 rows, but the plan remaining the same..

Try both of these things
* REINDEX on the index being used in the query, then re-EXPLAIN
* enable_hashjoin = off, then re-EXPLAIN

You should first attempt to get the same plan, then confirm it really is
faster before we worry why the optimizer hadn't picked that plan. 

We already know that MySQL favors nested loop joins, so turning up a
plan that on this occasion is actually better that way is in no way
representative of general performance. Does MySQL support hash joins?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
 The problem has been finding someone who has both the time and the
 ability to do the work.

 Unfortunately there has been significant debate over which parts of
 partitioning need to be improved. My own view is that considerable
 attention needs to be applied to both the executor and planner to
 improve matters and that syntax improvements are largely irrelevant,
 though seductive.

My thought about it is that what we really need is an explicit notion
of partitioned tables built into the system, instead of trying to make
the planner re-deduce the partitioning behavior from first principles
every time it builds a plan for such a table.  Such a notion would
presumably involve some new syntax to allow the partitioning rule to be
specified at table creation time.  I agree that the syntax details are a
minor issue, but the set of possible partitioning rules is certainly a
topic of great interest.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2009-05-06 Thread Tom Lane
Matthew Wakeling matt...@flymine.org writes:
 Here is my patch ported over to the seg contrib package, attached. Apply 
 it to seg.c and all should be well. A similar thing needs to be done to 
 cube, but I haven't looked at that.

Teodor, Oleg, do you intend to review/apply this patch?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 17:55 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
  The problem has been finding someone who has both the time and the
  ability to do the work.
 
  Unfortunately there has been significant debate over which parts of
  partitioning need to be improved. My own view is that considerable
  attention needs to be applied to both the executor and planner to
  improve matters and that syntax improvements are largely irrelevant,
  though seductive.
 
 My thought about it is that what we really need is an explicit notion
 of partitioned tables built into the system, instead of trying to make
 the planner re-deduce the partitioning behavior from first principles
 every time it builds a plan for such a table.  Such a notion would
 presumably involve some new syntax to allow the partitioning rule to be
 specified at table creation time.  I agree that the syntax details are a
 minor issue, but the set of possible partitioning rules is certainly a
 topic of great interest.

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.

Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 I think there should be a way to refer to individual partitions as
 objects.

Yeah, the individual partitions should be nameable tables, otherwise we
will be reinventing a *whole* lot of management stuff to little gain.
I don't actually think there is anything wrong with using table
inheritance as the basic infrastructure --- I just want more smarts
about one particular use pattern of inheritance.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Alvaro Herrera
Simon Riggs escribió:


 Hopefully, notions of partitioning won't be directly tied to chunking of
 data for parallel query access. Most queries access recent data and
 hence only a single partition (or stripe), so partitioning and
 parallelism and frequently exactly orthogonal. 

I think there should be a way to refer to individual partitions as
objects.  That way we could execute some commands to enable certain
optimizations, for example mark this partition read only which would
mean it could be marked as not needing vacuum.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance