Re: [PERFORM] Another weird one with an UPDATE

2003-10-13 Thread David Griffiths
It's a slight improvement, but that could be other things as well.

I'd read that how you tune Postgres will determine how the optimizer works
on a query (sequential scan vs index scan). I am going to post all I've done
with tuning tommorow, and see if I've done anything dumb. I've found some
contradictory advice, and I'm still a bit hazy on how/why Postgres trusts
the OS to do caching. I'll post it all tommorow.




 Merge Join  (cost=11819.21..15258.55 rows=12007 width=752) (actual
time=4107.64..5587.81 rows=20880 loops=1)
   Merge Cond: (outer.commercial_entity_id = inner.commercial_entity_id)
   -  Index Scan using comm_serv_comm_ent_id_i on commercial_service cs
(cost=0.00..3015.53 rows=88038 width=12) (actual time=0.05..487.23
rows=88038 loops=1)
   -  Sort  (cost=11819.21..11846.08 rows=10752 width=740) (actual
time=3509.07..3955.15 rows=25098 loops=1)
 Sort Key: ce.commercial_entity_id
 -  Merge Join  (cost=0.00..9065.23 rows=10752 width=740) (actual
time=0.18..2762.13 rows=7990 loops=1)
   Merge Cond: (outer.user_account_id =
inner.user_account_id)
   -  Index Scan using user_account_pkey on user_account
(cost=0.00..8010.39 rows=72483 width=716) (actual time=0.05..2220.86
rows=72483 loops=1)
   -  Index Scan using comm_ent_usr_acc_id_i on
commercial_entity ce  (cost=0.00..4787.69 rows=78834 width=24) (actual
time=0.02..55.64 rows=7991 loops=1)
 Total runtime: 226239.77 msec
(10 rows)

David

- Original Message -
From: Stephan Szabo [EMAIL PROTECTED]
To: David Griffiths [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Sunday, October 12, 2003 6:48 PM
Subject: Re: [PERFORM] Another weird one with an UPDATE


 On Sun, 12 Oct 2003, David Griffiths wrote:

  [snip]
 
   I think you want something like:
   UPDATE user_account SET last_name = 'abc'
WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service
cs
WHERE user_account.user_account_id = ce.user_account_id AND
ce.commercial_entity_id = cs.commercial_entity_id);
 
  Unfort, this is still taking a long time.

 Hmm, does
 UPDATE user_account SET last_name='abc'
  FROM commercial_entity ce, commercial_service cs
  WHERE user_account.user_account_id = ce.user_account_id AND
   ce.commercial_entity_id=cs.commercial_entity_id;
 give the right results... That might end up being faster.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Another weird one with an UPDATE

2003-10-12 Thread David Griffiths
[snip]

 I think you want something like:
 UPDATE user_account SET last_name = 'abc'
  WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs
  WHERE user_account.user_account_id = ce.user_account_id AND
  ce.commercial_entity_id = cs.commercial_entity_id);

Unfort, this is still taking a long time.



---
 Seq Scan on user_account  (cost=0.00..748990.51 rows=36242 width=716)
(actual time=10262.50..26568.03 rows=3771 loops=1)
   Filter: (subplan)
   SubPlan
 -  Nested Loop  (cost=0.00..11.47 rows=1 width=24) (actual
time=0.24..0.24 rows=0 loops=72483)
   -  Index Scan using comm_ent_usr_acc_id_i on commercial_entity
ce  (cost=0.00..4.12 rows=1 width=12) (actual time=0.05..0.05 rows=0
loops=72483)
 Index Cond: ($0 = user_account_id)
   -  Index Scan using comm_serv_comm_ent_id_i on
commercial_service cs  (cost=0.00..7.32 rows=3 width=12) (actual
time=1.72..1.72 rows=0 loops=7990)
 Index Cond: (outer.commercial_entity_id =
cs.commercial_entity_id)
 Total runtime: 239585.09 msec
(9 rows)

Anyone have any thoughts?

David

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Another weird one with an UPDATE

2003-10-11 Thread David Griffiths
Thanks - that worked.

David
- Original Message -
From: Stephan Szabo [EMAIL PROTECTED]
To: David Griffiths [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Saturday, October 11, 2003 3:34 PM
Subject: Re: [PERFORM] Another weird one with an UPDATE


 On Sat, 11 Oct 2003, David Griffiths wrote:

  Sorry - just found the FAQ (
  http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22
  http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22 ) on how
  IN is very slow.
 
  So I rewrote the query:
 
  \o ./data/temp.txt
  SELECT current_timestamp;
  UPDATE user_account SET last_name = 'abc'
  WHERE EXISTS (SELECT ua.user_account_id FROM user_account ua,
  commercial_entity ce, commercial_service cs
  WHERE ua.user_account_id = ce.user_account_id AND
  ce.commercial_entity_id = cs.commercial_entity_id);
  SELECT current_timestamp;

 I don't think that's the query you want.  You're not binding the subselect
 to the outer values of user_account.

 I think you want something like:
 UPDATE user_account SET last_name = 'abc'
  WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs
  WHERE user_account.user_account_id = ce.user_account_id AND
  ce.commercial_entity_id = cs.commercial_entity_id);

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL vs MySQL

2003-10-09 Thread David Griffiths



This is a timely thread for myself, as I'm in the 
middle of testing both databases as an Oracle replacement.

As of this moment, I know more about MySQL (tuning, 
setup, features) than I do about Postgres. Not because I like MySQL more, but 
because

1)the MySQL docs are better (sorry - I found 
them easier to read, and more comprehensive; I had an easier time finding the 
answers I needed)
2)there are more web pages devoted to MySQL 
(probably because it has a bit more market share)
3)there are more books on MySQL at the 
bookstore (I haven't had a chance to pick up Bruce's book yet; it might be all 
the book I'd ever need)
4)we looked at MySQL first (we needed 
replication, and eRServer had not been open-sourced when we started 
looking)

With regards to #1, I'd like to specifically 
mention tuning - the docs at http://www.postgresql.org/docs/7.3/static/runtime-config.htmlgive 
a basic explanation of the different options, but much more is needed for 
tuning. I'm running into a problem with an update statement (that uses a select 
in a sub-query) in Postgres - it's taking hours to run (the equiv, using a 
multi-table update statement in MySQL instead of a sub-query, takes all of 2 
seconds). I'll be posting it later once I do more reading to make sure I've done 
as much as I can to solve it myself.

I really agree with this post:

"I guess my point is simply this: instead of 
saying: "okay we use default settings that will run on _old_ hardware too" we 
should go for a little script that creates a "still save but much better" config 
file. There's just no point in setting SHARED_BUFFERS to something like 16 
(what's the current default?) if the PC has = 1 GB of RAM. Setting it to 
8192 would still be save, but 512 times better... ;-) (IIRC 8192 would 
take 64 MB of RAM, which should be save if you leave the default 
MAX_CONNECTIONS.)" It provides examples, and some real numbers to help someone 
new to the database take an initial crack at tuning. Remember, you're trying to 
compete with the big-guys (Oracle, etc), so providing info that an Oracle DBA 
needs is pretty critical. I'm currently at a complete loss for tuning Postgres 
(it seems to do things very differently than both Oracle and 
MySQL).


I also have to admit a bit of irritation reading 
this thread; there is a fair number of incorrect statements on this thread that, 
while not wrong, definately aren't right:

"Speed depends on the nature of use and the 
complexity of queries. If you are doing updates of related tables, ACID is 
of vital importance and MySQL doesn't provide it."
MySQL has ACID in InnoDB. I've found that MySQL is 
actually very fast on complex queries w/InnoDB (six tables, 1 million rows, two 
of the joins are outer-joins. In fact, I can get InnoDB to be almost as fast as 
MyISAM. Complex updates are also very very fast. We have not tried flooding 
either database with dozens of complex statements from multiple clients; that's 
coming soon, and from what I've read, MySQL won't do too well.

"using InnoDB tables (the only way to have foreign 
keys,transactions, and row level locking for MySQL) makes MySQL slower 
andadds complexity to tuning the database"
Adding this: "innodb_flush_method=O_DSYNC" to the 
my.cnf made InnoDB as fast as MyISAM in our tests. It doesn't turn off disk 
flushing; it's just a flush method that might work better with different kernels 
and drives; it's one of those "play with this and see if it helps" parameters; 
there are lots of those in Postgres, it seems. There are 10 variables for tuning 
InnoDB (and you don't have to tune for MyISAM, so it's actually a six-of-one, 
half-dozen-of-the-other). Setup between the two seems to be about the 
same.

"PostgreSQL supports constraints. MySQL doesn't; 
programmers need to take care of that from the client side"
Again, InnoDB supports constraints.

"Transactions: We've been here before. Suffice to 
say, MySQL+InnoDB is almost there. Plain ol' MySQL doesn't have it, which tells 
you something about their philosophy towards database design."
InnoDB supports transactions very nicely, has the 
equivalent of WAL, and one thing I really like: a tablespace (comprised of data 
files that can be spread around multiple hard drives), and in a month or so, 
InnoDB will support multiple tablespaces.


To be fair, here are a few MySQL "bad-things" that 
weren't mentioned:

1) InnoDB can't do a hot-backup with the basic 
backup tools. To hot-backup an InnoDB database, you need to pay $450 US per 
database per year ($1150 per database perpetual) for a proprietary hot-backup 
tool
2) InnoDB can't do full-text 
searching.
3) I see alot more corrupt-database bugs on the 
MySQL lists (most are MyISAM, but a few InnoDB bugs pop up from time to time) - 
way more than I see on the Postgres lists.
4) There are some really cranky people on the MySQL 
lists; the Postgres lists seem to be much more effective (esp. with people like 
Tom Lane). Maybe it's because they get alot of 

[PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths




We're having a problem with a query during our 
investigation into Postgres (as an Oracle replacement). This query Postgres 
takes 20-40 seconds (multiple runs). Tom Lan recommended I post it here, with an 
explain-analyze.

Here's the query:

EXPLAIN ANALYZE SELECT company_name, address_1, 
address_2, address_3, city,address_list.state_province_id, 
state_province_short_desc, country_desc, zip_code, 
address_list.country_id,contact_info.email, 
commercial_entity.user_account_id, phone_num_1, phone_num_fax, website, 
boats_websiteFROM commercial_entity, country, user_account,address_list 
LEFT JOIN state_province ON address_list.state_province_id = 
state_province.state_province_idLEFT JOIN contact_info ON 
address_list.contact_info_id = contact_info.contact_info_idWHERE 
address_list.address_type_id = 101AND 
commercial_entity.commercial_entity_id=225528AND 
commercial_entity.commercial_entity_id = 
address_list.commercial_entity_idAND address_list.country_id = 
country.country_idAND commercial_entity.user_account_id = 
user_account.user_account_idAND user_account.user_role_id IN (101, 
101);

Here's the explain:

Nested 
Loop (cost=0.00..64570.33 rows=1 width=385) (actual 
time=42141.08..42152.06 rows=1 loops=1) - Nested 
Loop (cost=0.00..64567.30 rows=1 width=361) (actual 
time=42140.80..42151.77 rows=1 
loops=1) - Nested 
Loop (cost=0.00..64563.97 rows=1 width=349) (actual 
time=42140.31..42151.27 rows=1 
loops=1) 
Join Filter: ("outer".commercial_entity_id = 
"inner".commercial_entity_id) 
- Index Scan using commercial_entity_pkey on commercial_entity 
(cost=0.00..5.05 rows=1 width=94) (actual time=0.57..0.58 rows=1 
loops=1) 
Index Cond: (commercial_entity_id = 
225528::numeric) 
- Materialize (cost=63343.66..63343.66 rows=97221 width=255) 
(actual time=41741.96..41901.17 rows=90527 
loops=1) 
- Merge Join (cost=0.00..63343.66 rows=97221 width=255) (actual 
time=1.44..41387.68 rows=90527 
loops=1) 
Merge Cond: ("outer".contact_info_id = 
"inner".contact_info_id) 
- Nested Loop (cost=0.00..830457.52 rows=97221 width=222) 
(actual time=0.95..39178.32 rows=90527 
loops=1) 
Join Filter: ("outer".state_province_id = 
"inner".state_province_id) 
- Index Scan using addr_list_ci_id_i on address_list 
(cost=0.00..586676.65 rows=97221 width=205) (actual time=0.49..2159.90 
rows=90527 
loops=1) 
Filter: (address_type_id = 
101::numeric) 
- Seq Scan on state_province (cost=0.00..1.67 rows=67 width=17) 
(actual time=0.00..0.21 rows=67 
loops=90527) 
- Index Scan using contact_info_pkey on contact_info 
(cost=0.00..3366.76 rows=56435 width=33) (actual time=0.44..395.75 rows=55916 
loops=1) - Index 
Scan using user_account_pkey on user_account (cost=0.00..3.32 rows=1 
width=12) (actual time=0.46..0.46 rows=1 
loops=1) 
Index Cond: ("outer".user_account_id = 
user_account.user_account_id) 
Filter: (user_role_id = 101::numeric) - Index Scan 
using country_pkey on country (cost=0.00..3.01 rows=1 width=24) (actual 
time=0.25..0.25 rows=1 
loops=1) Index Cond: 
("outer".country_id = country.country_id)Total runtime: 42165.44 
msec(21 rows)


I will post the schema in a seperate email - the list has rejected one big 
email 3 times now.

David


[PERFORM] Tuning/performance issue (part 2)

2003-09-30 Thread David Griffiths




Here's the schema:

 
Table "public.address_list" 
Column 
| 
Type | 
Modifiers--++---address_list_id 
| numeric(10,0) | not 
nulladdress_1 
| character varying(100) 
|address_2 
| character varying(100) 
|address_3 
| character varying(100) 
|city 
| character varying(100) 
|zip_code 
| character varying(20) 
|phone_num_1 | 
character varying(100) 
|phone_num_2 | 
character varying(100) 
|phone_num_fax | character 
varying(100) |state_province_id | 
numeric(10,0) 
|user_account_id | 
numeric(10,0) 
|marina_id 
| numeric(10,0) 
|commercial_entity_id | 
numeric(10,0) 
|address_type_id | 
numeric(10,0) | not 
nulldistributor_id | 
numeric(10,0) 
|contact_info_id | 
numeric(10,0) 
|country_id 
| numeric(10,0) 
|lang_id 
| numeric(10,0) 
|boat_listing_id | 
numeric(10,0) 
|Indexes: address_list_pkey primary key btree 
(address_list_id), 
addr_list_addr_type_id_i btree 
(address_type_id), 
addr_list_bl_id_i btree 
(boat_listing_id), 
addr_list_bl_sp_count_i btree (boat_listing_id, state_province_id, 
country_id), 
addr_list_ce_sp_c_at_c_i btree (commercial_entity_id, state_province_id, 
country_id, address_type_id, 
city), 
addr_list_ce_sp_countr_addr_type_i btree (commercial_entity_id, 
state_province_id, country_id, 
address_type_id), 
addr_list_ci_id_i btree 
(contact_info_id), 
addr_list_comm_ent_id_i btree 
(commercial_entity_id), 
addr_list_count_lang_i btree (country_id, 
lang_id), 
addr_list_country_id_i btree 
(country_id), 
addr_list_cty_bl_count_i btree (city, boat_listing_id, 
country_id), addr_list_cty_i 
btree (city), 
addr_list_distrib_id_i btree 
(distributor_id), 
addr_list_marina_id_i btree 
(marina_id), 
addr_list_sp_id_i btree 
(state_province_id), 
addr_list_ua_id_i btree (user_account_id)Foreign Key constraints: $1 FOREIGN 
KEY (address_type_id) REFERENCES address_type(address_type_id) ON UPDATE NO 
ACTION ON DELETE NO 
ACTION, 
$2 FOREIGN KEY (commercial_entity_id) REFERENCES 
commercial_entity(commercial_entity_id) ON UPDATE NO ACTION ON DELETE NO 
ACTION, 
$3 FOREIGN KEY (contact_info_id) REFERENCES contact_info(contact_info_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$4 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON 
UPDATE NO ACTION ON DELETE NO 
ACTION, 
$5 FOREIGN KEY (state_province_id) REFERENCES state_province(state_province_id) 
ON UPDATE NO ACTION ON DELETE NO ACTION

 
Table 
"public.commercial_entity" 
Column 
| 
Type 
| 
Modifiers---+-+-commercial_entity_id 
| 
numeric(10,0) 
| not 
nullcompany_name 
| character varying(100) | not 
nullwebsite 
| character varying(200) 
|modify_date 
| timestamp without time zone 
|user_account_id 
| 
numeric(10,0) 
|source_id 
| 
numeric(10,0) 
| not nullcommercial_entity_type_id | 
numeric(10,0) 
|boats_website 
| character varying(200) 
|updated_on 
| timestamp without time zone | not null default ('now'::text)::timestamp(6) 
with time 
zonedealer_level_id 
| 
numeric(10,0) 
|lang_id 
| 
numeric(10,0) 
| default 
'100'yw_account_id 
| 
numeric(10,0) 
|keybank_dealer_code | 
numeric(10,0) 
|dnetaccess_id 
| 
numeric(10,0) 
| not null default 
0interested_in_dns | 
numeric(10,0) 
| not null default 
0parent_office_id 
| 
numeric(10,0) 
|marinesite_welcome_msg | character 
varying(500) 
|alt_marinesite_homepage | character 
varying(256) 
|comments 
| character varying(4000) 
|show_finance_yn 
| character varying(1) | not null 
default 
'Y'show_insurance_yn | 
character varying(1) | not null 
default 
'Y'show_shipping_yn 
| character varying(1) | not null 
default 
'Y'yw_account_id_c 
| character varying(11) 
|sales_id 
| 
numeric(10,0) 
|Indexes: commercial_entity_pkey primary key btree 
(commercial_entity_id), 
comm_ent_boat_web_ui unique btree 
(boats_website), 
comm_ent_key_dlr_cd_ui unique btree 
(keybank_dealer_code), 
comm_ent_cny_name_i btree 
(company_name), 
comm_ent_dlr_lvl_id_i btree (dealer_level_id, 
lang_id), comm_ent_src_id_i 
btree (source_id), 
comm_ent_type_id_i btree 
(commercial_entity_type_id), 
comm_ent_upd_on btree 
(updated_on), 
comm_ent_usr_acc_id_i btree 
(user_account_id), 
comm_ent_yw_acc_id_i btree (yw_account_id)Foreign Key constraints: $1 
FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON 
DELETE NO 
ACTION, 
$2 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON 
UPDATE NO ACTION ON DELETE NO ACTION

 
Table "public.country" Column 
| 
Type | 
Modifiers--++---country_id 
| numeric(10,0) | not 
nulllang_id | 
numeric(10,0) | not 
nullcountry_desc | character varying(100) | not nullIndexes: 
country_pkey primary key btree (country_id)

 
Table 
"public.user_account" 
Column 
| 
Type 
| 
Modifiers---+-+-user_account_id 
| 
numeric(10,0) 
| not 

Re: [PERFORM] Tuning/performance issue...

2003-09-30 Thread David Griffiths
 The most efficient way to handle this query would probably be to join
 the three tables with restrictions first, and then join the other tables
 to those.  You could force this with not too much rewriting using
 something like (untested, but I think it's right)

 ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN
 address_list LEFT JOIN state_province ON address_list.state_province_id
 = state_province.state_province_id
 LEFT JOIN contact_info ON address_list.contact_info_id =
 contact_info.contact_info_id
 CROSS JOIN country
 WHERE ...

 The explicit JOINs associate left-to-right, so this gives the intended
 join order.  (In your original query, explicit JOIN binds more tightly
 than commas do.)

Ok - that's interesting - I'll have to do some reading and more testing.

 The reason PG's planner doesn't discover this join order for itself
 is that it's written to not attempt to re-order outer joins from the
 syntactically defined ordering.  In general, such reordering would
 change the results.  It is possible to analyze the query and prove that
 certain reorderings are valid (don't change the results), but we don't
 currently have code to do that.

Not sure I follow. Are you saying that, depending on when the outer-join is
applied to the rows found at the time, you may end up with a different set
of rows? I would have expected the optimizer to do the outer-joins last, as
the extra data received by the outer-joins is not mandatory, and won't
affect
the rows that were retreived by joining user_account, address_list, and
commercial_entity.

An outer join would *never* be the most restrictive
join in a query. I thought (from my readings on Oracle query tuning) that
finding the most restrictive table/index was the first task of an optimizer.
Reduce the result set as quickly as possible. That query has the line,

AND commercial_entity.commercial_entity_id=225528,

which uses an index (primary key) and uses an =. I would have expected
that to be done first, then joined with the other inner-join tables, and
finally
have the outer-joins applied to the final result set to fill in the might
be there data.

Anyway, if the optimizer does the outer-joins first (address_list with
state_province
and contact_info), then it's picking the table with the most rows
(address_list has
200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering
the FROM clause (and LEFT JOIN portions) help?

Could you give an example where applying an outer-join at a different time
could
result in different results? I think I can see at situation where you use
part of the results
in the outer-join in the where clause, but I am not sure.

 I'm prepared to believe that Oracle contains code that actually does the
 analysis about which outer-join reorderings are valid, and is then able
 to find the right join order by deduction.

I'm not sure about Oracle (other than what I stated above). In fact, about
half
the time, updating table stats to try to get the Oracle optimizer to do a
better
job on a query results in even worse performance.

 ... FROM commercial_entity CROSS JOIN country) CROSS JOIN
  user_account) CROSS JOIN address_list)
  LEFT JOIN state_province ON ...)
  LEFT JOIN contact_info ON ...
 WHERE ...

 This is clearly at odds with the SQL spec's syntactically defined join
 order semantics.  It's possible that it always yields the same results
 as the spec requires, but I'm not at all sure about that.

Again, I don't know. On the 3 queries based on these tables, Postgres
and MySQL return the exact same data (they use the same data set).

Do you have a link to the SQL spec's join-order requirements?

 In any case
 this strategy is certainly not better than ours, it just performs
 poorly on a different set of queries.  Would I be out of line to
 speculate that your query was previously tuned to work well in MySQL?

The query was pulled from our codebase (written for Oracle). I added a bit
to it
to make it slower, and then ported to MySQL and tested there first (just
re-wrote
the outer-join syntax). I found that  re-ordering the tables in the
from-clause on
MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's
because I had forgotten to re-analyze the tables after refreshing the
dataset.
Now, table order doesn't make a difference in speed (or results).

If anything, I've done more tuning for Postgres - added some extra indexes
to try to help
(country.country_id had a composite index with another column, but not an
index for
just it), etc.

The dataset and schema is pure-Oracle. I extracted it out of the database,
removed all
Oracle-specific extensions, changed the column types, and migrated the
indexes and
foreign keys to MySQL and Postgres. Nothing more (other than an extra index
or two for Postgres - nada for MySQL).

This is all part of a migrate away from Oracle project. We are looking at
3 databases -
MySQL (InnoDB), Postgres and Matisse (object oriented). We 

[PERFORM] Test...

2003-09-29 Thread David Griffiths



I've posted several emails, and have yet to see one 
show up (this one might not either).

Is there a size limit to an email (it had a big 
analyze, and schema information)??
David