Re: [PERFORM] Tuning/performance issue...

2003-10-04 Thread Jeff
On Fri, 3 Oct 2003, Bruce Momjian wrote:

>
> I have updated the FAQ to be:
>
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users.
>
> Is this accurate?  It seems so.
>
>

Another thing I noticed - If you use a dataset that can live in mysql's
query cache / os cache it screams, until it has to hit the disk. then
GRINDING HALT.

It would be nice if someone (I don't have the time now) did a comparison
of say:
selct value where name = XXX; [where xxx varies] with 1,10,20,50
connections

then make progressively more complex queries. And occasionally point out
mysql silly omissions:
select * from myview where id = 1234
[Oh wait! mysql doesn't have views. Ooopsy!]

Wrapping up - PG is not that slow for simple queries either.  It can be
rather zippy - and PREPARE can give HUGE gains - even for simple
statements.   I've often wondered if YACC, etc is a bottleneck (You can
only go as fast as your parser can go).

Hurray for PG!

And I'm giving my PG presentation monday.  I hope to post it tuesday after
I update with comments I receive and remove confidential information.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Tom Lane
Rod Taylor <[EMAIL PROTECTED]> writes:
> I've often wondered if they win on those because they have a lighter
> weight parser / optimizer with less "lets try simplifying this query"
> steps or if the MYISAM storage mechanism is simply quicker at pulling
> data off the disk.

Comparing pre-PREPAREd queries would probably tell something about that.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Christopher Browne
Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] (Bruce Momjian) would write:
> I have updated the FAQ to be:
>
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users. 
>
> Is this accurate?  It seems so.

I would think it more accurate if you use the phrase "faster for
simple SELECT queries."

MySQL uses a rule-based optimizer which, when the data fits the rules
well, can pump queries through lickety-split without any appreciable
pause for evaluation (or reflection :-).  That's _quite_ a successful
strategy when users are doing what loosely amounts to evaluating
association tables.

select * from table where key = value;

Which is just like tying a Perl variable to a hash table, and doing
   $value = $TABLE{$key};

In web applications where they wanted something a _little_ more
structured than hash tables, that may 'hit the spot.'

Anything hairier than that gets, of course, hairier.  If you want
something that's TRULY more structured, you may lose a lot of hair
:-).
-- 
output = reverse("gro.gultn" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/oses.html
"If you want to talk with some experts about something, go to the bar
where they hang out, buy a round of beers, and they'll surely talk
your ear off, leaving you wiser than before.

If you, a stranger, show up at the bar, walk up to the table, and ask
them to fax you a position paper, they'll tell you to call their
office in the morning and ask for a rate sheet." -- Miguel Cruz

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Bruce Momjian
Rod Taylor wrote:
-- Start of PGP signed section.
> On Fri, 2003-10-03 at 21:39, Bruce Momjian wrote:
> > I have updated the FAQ to be:
> > 
> >   In comparison to MySQL or leaner database systems, we are
> >   faster for multiple users, complex queries, and a read/write query
> >   load.  MySQL is faster for SELECT queries done by a few users. 
> > 
> > Is this accurate?  It seems so.
> 
> May wish to say ... for simple SELECT queries ...

Updated.

> Several left outer joins, subselects and a large number of joins are
> regularly performed faster in PostgreSQL due to a more mature optimizer.
> 
> But MySQL can pump out SELECT * FROM table WHERE key = value; queries in
> a hurry.
> 
> 
> I've often wondered if they win on those because they have a lighter
> weight parser / optimizer with less "lets try simplifying this query"

I think that is part of it.

> steps or if the MYISAM storage mechanism is simply quicker at pulling
> data off the disk.

And their heap is indexed by myisam, right. I know with Ingres that Isam
was usually faster than btree because you didn't have all those leaves
to traverse to get to the data.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Rod Taylor
On Fri, 2003-10-03 at 21:39, Bruce Momjian wrote:
> I have updated the FAQ to be:
> 
>   In comparison to MySQL or leaner database systems, we are
>   faster for multiple users, complex queries, and a read/write query
>   load.  MySQL is faster for SELECT queries done by a few users. 
> 
> Is this accurate?  It seems so.

May wish to say ... for simple SELECT queries ...

Several left outer joins, subselects and a large number of joins are
regularly performed faster in PostgreSQL due to a more mature optimizer.

But MySQL can pump out SELECT * FROM table WHERE key = value; queries in
a hurry.


I've often wondered if they win on those because they have a lighter
weight parser / optimizer with less "lets try simplifying this query"
steps or if the MYISAM storage mechanism is simply quicker at pulling
data off the disk.



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Tuning/performance issue...

2003-10-03 Thread Bruce Momjian

I have updated the FAQ to be:

  In comparison to MySQL or leaner database systems, we are
  faster for multiple users, complex queries, and a read/write query
  load.  MySQL is faster for SELECT queries done by a few users. 

Is this accurate?  It seems so.

---

Oleg Lebedev wrote:
> Jeff,
> I would really appreciate if you could send me that lengthy presentation
> that you've written on pg/other dbs comparison.
> Thanks.
> 
> Oleg
> 
> -Original Message-
> From: Jeff [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 01, 2003 6:23 AM
> To: David Griffiths
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Tuning/performance issue...
> Importance: Low
> 
> 
> On Tue, 30 Sep 2003, David Griffiths wrote:
> 
> >
> > This is all part of a "migrate away from Oracle" project. We are 
> > looking at 3 databases - MySQL (InnoDB), Postgres and Matisse (object 
> > oriented). We have alot of queries like this
> > or worse, and I'm worried that many of them would need to be
> re-written. The
> > developers
> > know SQL, but nothing about tuning, etc.
> >
> 
> There's a movement at my company to ditch several commercial db's in
> favor of a free one.  I'm currently the big pg fan around here and I've
> actually written a rather lengthy presentation about pg features, why,
> tuning, etc. but another part was some comparisons to other db's..
> 
> I decided so I wouldn't be blinding flaming mysql to give it a whirl and
> loaded it up with the same dataset as pg.  First thing I hit was lack of
> stored procedures.   But I decided to code around that, giving mysql the
> benefit of the doubt.  What I found was interesting.
> 
> For 1-2 concurrent
> 'beaters' it screamed. ultra-fast.  But.. If you increase the concurrent
> beaters up to say, 20 Mysql comes to a grinding halt.. Mysql and the
> machine itself become fairly unresponsive.  And if you do cache
> unfriendly
> queries it becomes even worse.   On PG - no problems at all. Scaled fine
> and dandy up.  And with 40 concurrent beaters the machine was still
> responsive.  (The numbers for 20 client was 220 seconds (pg) and 650
> seconds (mysql))
> 
> So that is another test to try out - Given your configuration I expect
> you have lots of concurrent activity.
> 
> --
> Jeff Trout <[EMAIL PROTECTED]>
> http://www.jefftrout.com/
> http://www.stuarthamm.net/
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>   joining column's datatypes do not match
> 
> *
> 
> This e-mail may contain privileged or confidential material intended for the named 
> recipient only.
> If you are not the named recipient, delete this message and all attachments.
> Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
> information in this e-mail is prohibited.
> We reserve the right to monitor e-mail sent through our network. 
> 
> *
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Tuning/performance issue...

2003-10-01 Thread Oleg Lebedev
That would be great! When do you think this would be ready for us to see
;?)

-Original Message-
From: Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 11:42 AM
To: Oleg Lebedev
Cc: [EMAIL PROTECTED]
Subject: RE: [PERFORM] Tuning/performance issue...


On Wed, 1 Oct 2003, Oleg Lebedev wrote:

> Jeff,
> I would really appreciate if you could send me that lengthy 
> presentation that you've written on pg/other dbs comparison. Thanks.
>

After I give the presentation at work and collect comments from my
coworkers (and remove some information you folks don't need to know :) I
will be very willing to post it for people to see.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Tuning/performance issue...

2003-10-01 Thread Jeff
On Wed, 1 Oct 2003, Oleg Lebedev wrote:

> Jeff,
> I would really appreciate if you could send me that lengthy presentation
> that you've written on pg/other dbs comparison.
> Thanks.
>

After I give the presentation at work and collect comments from my
coworkers (and remove some information you folks don't need to know :) I
will be very willing to post it for people to see.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Tuning/performance issue...

2003-10-01 Thread Oleg Lebedev
Jeff,
I would really appreciate if you could send me that lengthy presentation
that you've written on pg/other dbs comparison.
Thanks.

Oleg

-Original Message-
From: Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 01, 2003 6:23 AM
To: David Griffiths
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Tuning/performance issue...
Importance: Low


On Tue, 30 Sep 2003, David Griffiths wrote:

>
> This is all part of a "migrate away from Oracle" project. We are 
> looking at 3 databases - MySQL (InnoDB), Postgres and Matisse (object 
> oriented). We have alot of queries like this
> or worse, and I'm worried that many of them would need to be
re-written. The
> developers
> know SQL, but nothing about tuning, etc.
>

There's a movement at my company to ditch several commercial db's in
favor of a free one.  I'm currently the big pg fan around here and I've
actually written a rather lengthy presentation about pg features, why,
tuning, etc. but another part was some comparisons to other db's..

I decided so I wouldn't be blinding flaming mysql to give it a whirl and
loaded it up with the same dataset as pg.  First thing I hit was lack of
stored procedures.   But I decided to code around that, giving mysql the
benefit of the doubt.  What I found was interesting.

For 1-2 concurrent
'beaters' it screamed. ultra-fast.  But.. If you increase the concurrent
beaters up to say, 20 Mysql comes to a grinding halt.. Mysql and the
machine itself become fairly unresponsive.  And if you do cache
unfriendly
queries it becomes even worse.   On PG - no problems at all. Scaled fine
and dandy up.  And with 40 concurrent beaters the machine was still
responsive.  (The numbers for 20 client was 220 seconds (pg) and 650
seconds (mysql))

So that is another test to try out - Given your configuration I expect
you have lots of concurrent activity.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if
your
  joining column's datatypes do not match

*

This e-mail may contain privileged or confidential material intended for the named 
recipient only.
If you are not the named recipient, delete this message and all attachments.
Unauthorized reviewing, copying, printing, disclosing, or otherwise using information 
in this e-mail is prohibited.
We reserve the right to monitor e-mail sent through our network. 

*

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tuning/performance issue...

2003-10-01 Thread Tom Lane
David Griffiths <[EMAIL PROTECTED]> writes:
>> The reason PG's planner doesn't discover this join order for itself
>> is that it's written to not attempt to re-order outer joins from the
>> syntactically defined ordering.  In general, such reordering would
>> change the results.  It is possible to analyze the query and prove that
>> certain reorderings are valid (don't change the results), but we don't
>> currently have code to do that.

> Not sure I follow. Are you saying that, depending on when the outer-join is
> applied to the rows found at the time, you may end up with a different set
> of rows?

Here's an example showing that it's not always safe to rearrange join
order in the presence of outer joins:

jtest=# create table a (f1 int);
CREATE TABLE
jtest=# create table b (f1 int, f2 int);
CREATE TABLE
jtest=# create table c(f1 int, f2 int);
CREATE TABLE
jtest=# insert into a values (1);
INSERT 431307 1
jtest=# insert into b values (10,10);
INSERT 431308 1
jtest=# insert into b values (11,11);
INSERT 431309 1
jtest=# insert into c values (1,10);
INSERT 431310 1
jtest=# insert into c values (2,11);
INSERT 431311 1

jtest=# SELECT * FROM a, b LEFT JOIN c ON b.f2 = c.f2 WHERE a.f1 = c.f1;
 f1 | f1 | f2 | f1 | f2
++++
  1 | 10 | 10 |  1 | 10
(1 row)

Per spec the JOIN operator binds more tightly than comma, so this is
equivalent to:

jtest=# SELECT * FROM a JOIN (b LEFT JOIN c ON b.f2 = c.f2) ON a.f1 = c.f1;
 f1 | f1 | f2 | f1 | f2
++++
  1 | 10 | 10 |  1 | 10
(1 row)

Now suppose we try to join A and C before joining to B:

jtest=# SELECT * FROM b LEFT JOIN (a join c ON a.f1 = c.f1) ON b.f2 = c.f2;
 f1 | f2 | f1 | f1 | f2
++++
 10 | 10 |  1 |  1 | 10
 11 | 11 |||
(2 rows)

We get a different answer, because some C rows are eliminated before
reaching the left join, causing null-extended B rows to be added.

(I don't have a MySQL installation here to try, but if they still work
the way they used to, they get the wrong answer on the first query.)

The point of this example is just that there are cases where it'd be
incorrect for the planner to change the ordering of joins from what
is implied by the query syntax.  It is always safe to change the join
order when only inner joins are involved.  There are cases where outer
join order is safe to change too, but you need analysis code that checks
the query conditions to prove that a particular rearrangement is safe.
Right now, we don't have such code, and so we just follow the simple
rule "never rearrange any outer joins".

> I would have expected the optimizer to do the outer-joins last, as the
> extra data received by the outer-joins is not mandatory, and won't
> affect the rows that were retreived by joining user_account,
> address_list, and commercial_entity.

I think your example falls into the category of provably-safe
rearrangements ... but as I said, the planner doesn't know that.

> An outer join would *never* be the most restrictive
> join in a query.

Sure it can, if the restriction conditions are mainly on the outer
join's tables.  But that's not really the issue here.  As best I can
tell without seeing your data statistics, the most restrictive
conditions in your query are the ones on
commercial_entity.commercial_entity_id and user_account.user_role_id.
The trick is to apply those before joining any other tables.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Tuning/performance issue...

2003-10-01 Thread Jeff
On Tue, 30 Sep 2003, David Griffiths wrote:

>
> This is all part of a "migrate away from Oracle" project. We are looking at
> 3 databases -
> MySQL (InnoDB), Postgres and Matisse (object oriented). We have alot of
> queries like this
> or worse, and I'm worried that many of them would need to be re-written. The
> developers
> know SQL, but nothing about tuning, etc.
>

There's a movement at my company to ditch several commercial db's in favor
of a free one.  I'm currently the big pg fan around here and I've actually
written a rather lengthy presentation about pg features, why, tuning, etc.
but another part was some comparisons to other db's..

I decided so I wouldn't be blinding flaming mysql to give it a whirl and
loaded it up with the same dataset as pg.  First thing I hit was lack of
stored procedures.   But I decided to code around that, giving mysql the
benefit of the doubt.  What I found was interesting.

For 1-2 concurrent
'beaters' it screamed. ultra-fast.  But.. If you increase the concurrent
beaters up to say, 20 Mysql comes to a grinding halt.. Mysql and the
machine itself become fairly unresponsive.  And if you do cache unfriendly
queries it becomes even worse.   On PG - no problems at all. Scaled fine
and dandy up.  And with 40 concurrent beaters the machine was still
responsive.  (The numbers for 20 client was 220 seconds (pg) and 650
seconds (mysql))

So that is another test to try out - Given your configuration I expect you
have lots of concurrent activity.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning/performance issue....

2003-10-01 Thread Shridhar Daithankar
David Griffiths wrote:

And finally,
 
Here's the contents of the postgresql.conf file (I've been playing with 
these setting the last couple of days, and using the guide @ 
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html to 
make sure I didn't have it mis-tuned):
 
tcpip_socket = true
max_connections = 500# We will need quite a few connections; 
currently only one connection to database, however
port = 5432
shared_buffers = 5000   # I've tried 5000 to 80,000 with no 
apparent difference
wal_buffers = 16
sort_mem = 256# decreased this due to the large # of 
connectiosn
effective_cache_size = 5 # read that this can improve performance; 
hasn't done anything.
Reading this whole thread, I think most of the improvement you would get would 
be from rethinking your schema from PG point of view and examine each query.

After you changed your last query as Tom suggested for explicit join, how much 
improvement did it make? I noticed that you put 
'commercial_entity.commercial_entity_id=225528' as a second codition. Does it 
make any difference to put it ahead in where clause list?

 HTH

 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths
> The most efficient way to handle this query would probably be to join
> the three tables with restrictions first, and then join the other tables
> to those.  You could force this with not too much rewriting using
> something like (untested, but I think it's right)
>
> ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> CROSS JOIN country
> WHERE ...
>
> The explicit JOINs associate left-to-right, so this gives the intended
> join order.  (In your original query, explicit JOIN binds more tightly
> than commas do.)

Ok - that's interesting - I'll have to do some reading and more testing.

> The reason PG's planner doesn't discover this join order for itself
> is that it's written to not attempt to re-order outer joins from the
> syntactically defined ordering.  In general, such reordering would
> change the results.  It is possible to analyze the query and prove that
> certain reorderings are valid (don't change the results), but we don't
> currently have code to do that.

Not sure I follow. Are you saying that, depending on when the outer-join is
applied to the rows found at the time, you may end up with a different set
of rows? I would have expected the optimizer to do the outer-joins last, as
the extra data received by the outer-joins is not mandatory, and won't
affect
the rows that were retreived by joining user_account, address_list, and
commercial_entity.

An outer join would *never* be the most restrictive
join in a query. I thought (from my readings on Oracle query tuning) that
finding the most restrictive table/index was the first task of an optimizer.
Reduce the result set as quickly as possible. That query has the line,

"AND commercial_entity.commercial_entity_id=225528",

which uses an index (primary key) and uses an "=". I would have expected
that to be done first, then joined with the other inner-join tables, and
finally
have the outer-joins applied to the final result set to fill in the "might
be there" data.

Anyway, if the optimizer does the outer-joins first (address_list with
state_province
and contact_info), then it's picking the table with the most rows
(address_list has
200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering
the FROM clause (and LEFT JOIN portions) help?

Could you give an example where applying an outer-join at a different time
could
result in different results? I think I can see at situation where you use
part of the results
in the outer-join in the where clause, but I am not sure.

> I'm prepared to believe that Oracle contains code that actually does the
> analysis about which outer-join reorderings are valid, and is then able
> to find the right join order by deduction.

I'm not sure about Oracle (other than what I stated above). In fact, about
half
the time, updating table stats to try to get the Oracle optimizer to do a
better
job on a query results in even worse performance.

> ... FROM commercial_entity CROSS JOIN country) CROSS JOIN
>  user_account) CROSS JOIN address_list)
>  LEFT JOIN state_province ON ...)
>  LEFT JOIN contact_info ON ...
> WHERE ...
>
> This is clearly at odds with the SQL spec's syntactically defined join
> order semantics.  It's possible that it always yields the same results
> as the spec requires, but I'm not at all sure about that.

Again, I don't know. On the 3 queries based on these tables, Postgres
and MySQL return the exact same data (they use the same data set).

Do you have a link to the SQL spec's join-order requirements?

> In any case
> this strategy is certainly not "better" than ours, it just performs
> poorly on a different set of queries.  Would I be out of line to
> speculate that your query was previously tuned to work well in MySQL?

The query was pulled from our codebase (written for Oracle). I added a bit
to it
to make it slower, and then ported to MySQL and tested there first (just
re-wrote
the outer-join syntax). I found that  re-ordering the tables in the
from-clause on
MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's
because I had forgotten to re-analyze the tables after refreshing the
dataset.
Now, table order doesn't make a difference in speed (or results).

If anything, I've done more tuning for Postgres - added some extra indexes
to try to help
(country.country_id had a composite index with another column, but not an
index for
just it), etc.

The dataset and schema is pure-Oracle. I extracted it out of the database,
removed all
Oracle-specific extensions, changed the column types, and migrated the
indexes and
foreign keys to MySQL and Postgres. Nothing more (other than an extra index
or two for Postgres - nada for MySQL).

This is all part of a "migrate away from Oracle" project. We are looking at
3 databases -
MySQL (InnoDB)

Re: [PERFORM] Tuning/performance issue...

2003-09-30 Thread Tom Lane
David Griffiths <[EMAIL PROTECTED]> writes:
> ... FROM commercial_entity, country, user_account,
> address_list LEFT JOIN state_province ON address_list.state_province_id
> = state_province.state_province_id
> LEFT JOIN contact_info ON address_list.contact_info_id =
> contact_info.contact_info_id
> WHERE ...

I believe what you're getting burnt by is that PG's planner interprets
this as forcing the address_list * state_province * contact_info join
to be done before it joins those tables to commercial_entity, country,
and user_account --- for discussion see
http://www.postgresql.org/docs/7.3/static/explicit-joins.html

Unfortunately your WHERE-clause restriction conditions are on
address_list, commercial_entity, and user_account; and it seems the
address_list constraint is very weak.  So the plan ends up forming a
large fraction of the address_list * state_province * contact_info join,
only to throw it away again when there's no matching rows selected from
commercial_entity and user_account.  The actual runtime and actual row
counts from the EXPLAIN ANALYZE output show that this is what's
happening.

The most efficient way to handle this query would probably be to join
the three tables with restrictions first, and then join the other tables
to those.  You could force this with not too much rewriting using
something like (untested, but I think it's right)

... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
address_list LEFT JOIN state_province ON address_list.state_province_id
= state_province.state_province_id
LEFT JOIN contact_info ON address_list.contact_info_id =
contact_info.contact_info_id
CROSS JOIN country
WHERE ...

The explicit JOINs associate left-to-right, so this gives the intended
join order.  (In your original query, explicit JOIN binds more tightly
than commas do.)

The reason PG's planner doesn't discover this join order for itself
is that it's written to not attempt to re-order outer joins from the
syntactically defined ordering.  In general, such reordering would
change the results.  It is possible to analyze the query and prove that
certain reorderings are valid (don't change the results), but we don't
currently have code to do that.

> As a reference, our production Oracle database (exactly the same
> hardware, but RAID-mirroring) with way more load can handle the query in
> 1-2 seconds. I have MySQL 4.0.14 with InnoDB on the same machine
> (shutdown when I am testing Postgres, and visa versa) and it does the
> query in 0.20 seconds.

I'm prepared to believe that Oracle contains code that actually does the
analysis about which outer-join reorderings are valid, and is then able
to find the right join order by deduction.  The last I heard about
MySQL, they have no join-order analysis at all; they unconditionally
interpret this type of query left-to-right, ie as

... FROM commercial_entity CROSS JOIN country) CROSS JOIN
 user_account) CROSS JOIN address_list)
 LEFT JOIN state_province ON ...)
 LEFT JOIN contact_info ON ...
WHERE ...

This is clearly at odds with the SQL spec's syntactically defined join
order semantics.  It's possible that it always yields the same results
as the spec requires, but I'm not at all sure about that.  In any case
this strategy is certainly not "better" than ours, it just performs
poorly on a different set of queries.  Would I be out of line to
speculate that your query was previously tuned to work well in MySQL?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Tuning/performance issue....

2003-09-30 Thread David Griffiths



And finally,
 

Here's the contents of the postgresql.conf file 
(I've been playing with these setting the last couple of days, and using the 
guide @ http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html to 
make sure I didn't have it mis-tuned):
 
tcpip_socket = truemax_connections = 500        # We will need 
quite a few connections; currently only one connection to database, 
however
port = 5432shared_buffers = 
5000   # I've tried 
5000 to 80,000 with no apparent differencewal_buffers = 16
sort_mem = 
256    
    # decreased this due to the large # of 
connectiosn
effective_cache_size 
= 5 # read that this can improve 
performance; hasn't done anything.
 
The machine is a dual-Pentium 3 933mhz, with 2 
gigabytes of RAM and a 3Ware RAID-5 card.
 
As a reference, our production Oracle database 
(exactly the same hardware, but RAID-mirroring) with way more load can handle 
the query in 1-2 seconds. I have MySQL 4.0.14 with InnoDB on the same machine 
(shutdown when I am testing Postgres, and visa versa) and it does the query in 
0.20 seconds.
 
Thanks for any insight.
David.


[PERFORM] Tuning/performance issue (part 2)

2003-09-30 Thread David Griffiths




Here's the schema:
 
    
Table "public.address_list"    
Column    
|  
Type  | 
Modifiers--++--- address_list_id  
| numeric(10,0)  | not 
null address_1    
| character varying(100) 
| address_2    
| character varying(100) 
| address_3    
| character varying(100) 
| city 
| character varying(100) 
| zip_code 
| character varying(20)  
| phone_num_1  | 
character varying(100) 
| phone_num_2  | 
character varying(100) 
| phone_num_fax    | character 
varying(100) | state_province_id    | 
numeric(10,0)  
| user_account_id  | 
numeric(10,0)  
| marina_id    
| numeric(10,0)  
| commercial_entity_id | 
numeric(10,0)  
| address_type_id  | 
numeric(10,0)  | not 
null distributor_id   | 
numeric(10,0)  
| contact_info_id  | 
numeric(10,0)  
| country_id   
| numeric(10,0)  
| lang_id  
| numeric(10,0)  
| boat_listing_id  | 
numeric(10,0)  
|Indexes: address_list_pkey primary key btree 
(address_list_id), 
addr_list_addr_type_id_i btree 
(address_type_id), 
addr_list_bl_id_i btree 
(boat_listing_id), 
addr_list_bl_sp_count_i btree (boat_listing_id, state_province_id, 
country_id), 
addr_list_ce_sp_c_at_c_i btree (commercial_entity_id, state_province_id, 
country_id, address_type_id, 
city), 
addr_list_ce_sp_countr_addr_type_i btree (commercial_entity_id, 
state_province_id, country_id, 
address_type_id), 
addr_list_ci_id_i btree 
(contact_info_id), 
addr_list_comm_ent_id_i btree 
(commercial_entity_id), 
addr_list_count_lang_i btree (country_id, 
lang_id), 
addr_list_country_id_i btree 
(country_id), 
addr_list_cty_bl_count_i btree (city, boat_listing_id, 
country_id), addr_list_cty_i 
btree (city), 
addr_list_distrib_id_i btree 
(distributor_id), 
addr_list_marina_id_i btree 
(marina_id), 
addr_list_sp_id_i btree 
(state_province_id), 
addr_list_ua_id_i btree (user_account_id)Foreign Key constraints: $1 FOREIGN 
KEY (address_type_id) REFERENCES address_type(address_type_id) ON UPDATE NO 
ACTION ON DELETE NO 
ACTION, 
$2 FOREIGN KEY (commercial_entity_id) REFERENCES 
commercial_entity(commercial_entity_id) ON UPDATE NO ACTION ON DELETE NO 
ACTION, 
$3 FOREIGN KEY (contact_info_id) REFERENCES contact_info(contact_info_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$4 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$5 FOREIGN KEY (state_province_id) REFERENCES state_province(state_province_id) 
ON UPDATE NO ACTION ON DELETE NO ACTION
 
   
Table 
"public.commercial_entity"  
Column   
|    
Type 
|  
Modifiers---+-+- commercial_entity_id  
| 
numeric(10,0)   
| not 
null company_name  
| character varying(100)  | not 
null website   
| character varying(200)  
| modify_date   
| timestamp without time zone 
| user_account_id   
| 
numeric(10,0)   
| source_id 
| 
numeric(10,0)   
| not null commercial_entity_type_id | 
numeric(10,0)   
| boats_website 
| character varying(200)  
| updated_on    
| timestamp without time zone | not null default ('now'::text)::timestamp(6) 
with time 
zone dealer_level_id   
| 
numeric(10,0)   
| lang_id   
| 
numeric(10,0)   
| default 
'100' yw_account_id 
| 
numeric(10,0)   
| keybank_dealer_code   | 
numeric(10,0)   
| dnetaccess_id 
| 
numeric(10,0)   
| not null default 
0 interested_in_dns | 
numeric(10,0)   
| not null default 
0 parent_office_id  
| 
numeric(10,0)   
| marinesite_welcome_msg    | character 
varying(500)  
| alt_marinesite_homepage   | character 
varying(256)  
| comments  
| character varying(4000) 
| show_finance_yn   
| character varying(1)    | not null 
default 
'Y' show_insurance_yn | 
character varying(1)    | not null 
default 
'Y' show_shipping_yn  
| character varying(1)    | not null 
default 
'Y' yw_account_id_c   
| character varying(11)   
| sales_id  
| 
numeric(10,0)   
|Indexes: commercial_entity_pkey primary key btree 
(commercial_entity_id), 

[PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths




We're having a problem with a query during our 
investigation into Postgres (as an Oracle replacement). This query Postgres 
takes 20-40 seconds (multiple runs). Tom Lan recommended I post it here, with an 
explain-analyze.
 
Here's the query:
 
EXPLAIN ANALYZE SELECT company_name, address_1, 
address_2, address_3, city,address_list.state_province_id, 
state_province_short_desc, country_desc, zip_code, 
address_list.country_id,contact_info.email, 
commercial_entity.user_account_id, phone_num_1, phone_num_fax, website, 
boats_websiteFROM commercial_entity, country, user_account,address_list 
LEFT JOIN state_province ON address_list.state_province_id = 
state_province.state_province_idLEFT JOIN contact_info ON 
address_list.contact_info_id = contact_info.contact_info_idWHERE 
address_list.address_type_id = 101AND 
commercial_entity.commercial_entity_id=225528AND 
commercial_entity.commercial_entity_id = 
address_list.commercial_entity_idAND address_list.country_id = 
country.country_idAND commercial_entity.user_account_id = 
user_account.user_account_idAND user_account.user_role_id IN (101, 
101);
 
Here's the explain:
 
 Nested 
Loop  (cost=0.00..64570.33 rows=1 width=385) (actual 
time=42141.08..42152.06 rows=1 loops=1)   ->  Nested 
Loop  (cost=0.00..64567.30 rows=1 width=361) (actual 
time=42140.80..42151.77 rows=1 
loops=1) ->  Nested 
Loop  (cost=0.00..64563.97 rows=1 width=349) (actual 
time=42140.31..42151.27 rows=1 
loops=1)   
Join Filter: ("outer".commercial_entity_id = 
"inner".commercial_entity_id)   
->  Index Scan using commercial_entity_pkey on commercial_entity  
(cost=0.00..5.05 rows=1 width=94) (actual time=0.57..0.58 rows=1 
loops=1) 
Index Cond: (commercial_entity_id = 
225528::numeric)   
->  Materialize  (cost=63343.66..63343.66 rows=97221 width=255) 
(actual time=41741.96..41901.17 rows=90527 
loops=1) 
->  Merge Join  (cost=0.00..63343.66 rows=97221 width=255) (actual 
time=1.44..41387.68 rows=90527 
loops=1)   
Merge Cond: ("outer".contact_info_id = 
"inner".contact_info_id)   
->  Nested Loop  (cost=0.00..830457.52 rows=97221 width=222) 
(actual time=0.95..39178.32 rows=90527 
loops=1) 
Join Filter: ("outer".state_province_id = 
"inner".state_province_id) 
->  Index Scan using addr_list_ci_id_i on address_list  
(cost=0.00..586676.65 rows=97221 width=205) (actual time=0.49..2159.90 
rows=90527 
loops=1)   
Filter: (address_type_id = 
101::numeric) 
->  Seq Scan on state_province  (cost=0.00..1.67 rows=67 width=17) 
(actual time=0.00..0.21 rows=67 
loops=90527)   
->  Index Scan using contact_info_pkey on contact_info  
(cost=0.00..3366.76 rows=56435 width=33) (actual time=0.44..395.75 rows=55916 
loops=1) ->  Index 
Scan using user_account_pkey on user_account  (cost=0.00..3.32 rows=1 
width=12) (actual time=0.46..0.46 rows=1 
loops=1)   
Index Cond: ("outer".user_account_id = 
user_account.user_account_id)   
Filter: (user_role_id = 101::numeric)   ->  Index Scan 
using country_pkey on country  (cost=0.00..3.01 rows=1 width=24) (actual 
time=0.25..0.25 rows=1 
loops=1) Index Cond: 
("outer".country_id = country.country_id) Total runtime: 42165.44 
msec(21 rows)
 
 
I will post the schema in a seperate email - the list has rejected one big 
email 3 times now.
 
David