Re: [PERFORM] Another weird one with an UPDATE
It's a slight improvement, but that could be other things as well. I'd read that how you tune Postgres will determine how the optimizer works on a query (sequential scan vs index scan). I am going to post all I've done with tuning tommorow, and see if I've done anything dumb. I've found some contradictory advice, and I'm still a bit hazy on how/why Postgres trusts the OS to do caching. I'll post it all tommorow. Merge Join (cost=11819.21..15258.55 rows=12007 width=752) (actual time=4107.64..5587.81 rows=20880 loops=1) Merge Cond: (outer.commercial_entity_id = inner.commercial_entity_id) - Index Scan using comm_serv_comm_ent_id_i on commercial_service cs (cost=0.00..3015.53 rows=88038 width=12) (actual time=0.05..487.23 rows=88038 loops=1) - Sort (cost=11819.21..11846.08 rows=10752 width=740) (actual time=3509.07..3955.15 rows=25098 loops=1) Sort Key: ce.commercial_entity_id - Merge Join (cost=0.00..9065.23 rows=10752 width=740) (actual time=0.18..2762.13 rows=7990 loops=1) Merge Cond: (outer.user_account_id = inner.user_account_id) - Index Scan using user_account_pkey on user_account (cost=0.00..8010.39 rows=72483 width=716) (actual time=0.05..2220.86 rows=72483 loops=1) - Index Scan using comm_ent_usr_acc_id_i on commercial_entity ce (cost=0.00..4787.69 rows=78834 width=24) (actual time=0.02..55.64 rows=7991 loops=1) Total runtime: 226239.77 msec (10 rows) David - Original Message - From: Stephan Szabo [EMAIL PROTECTED] To: David Griffiths [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Sunday, October 12, 2003 6:48 PM Subject: Re: [PERFORM] Another weird one with an UPDATE On Sun, 12 Oct 2003, David Griffiths wrote: [snip] I think you want something like: UPDATE user_account SET last_name = 'abc' WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs WHERE user_account.user_account_id = ce.user_account_id AND ce.commercial_entity_id = cs.commercial_entity_id); Unfort, this is still taking a long time. Hmm, does UPDATE user_account SET last_name='abc' FROM commercial_entity ce, commercial_service cs WHERE user_account.user_account_id = ce.user_account_id AND ce.commercial_entity_id=cs.commercial_entity_id; give the right results... That might end up being faster. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Another weird one with an UPDATE
[snip] I think you want something like: UPDATE user_account SET last_name = 'abc' WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs WHERE user_account.user_account_id = ce.user_account_id AND ce.commercial_entity_id = cs.commercial_entity_id); Unfort, this is still taking a long time. --- Seq Scan on user_account (cost=0.00..748990.51 rows=36242 width=716) (actual time=10262.50..26568.03 rows=3771 loops=1) Filter: (subplan) SubPlan - Nested Loop (cost=0.00..11.47 rows=1 width=24) (actual time=0.24..0.24 rows=0 loops=72483) - Index Scan using comm_ent_usr_acc_id_i on commercial_entity ce (cost=0.00..4.12 rows=1 width=12) (actual time=0.05..0.05 rows=0 loops=72483) Index Cond: ($0 = user_account_id) - Index Scan using comm_serv_comm_ent_id_i on commercial_service cs (cost=0.00..7.32 rows=3 width=12) (actual time=1.72..1.72 rows=0 loops=7990) Index Cond: (outer.commercial_entity_id = cs.commercial_entity_id) Total runtime: 239585.09 msec (9 rows) Anyone have any thoughts? David ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Another weird one with an UPDATE
Thanks - that worked. David - Original Message - From: Stephan Szabo [EMAIL PROTECTED] To: David Griffiths [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Saturday, October 11, 2003 3:34 PM Subject: Re: [PERFORM] Another weird one with an UPDATE On Sat, 11 Oct 2003, David Griffiths wrote: Sorry - just found the FAQ ( http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22 http://jamesthornton.com/postgres/FAQ/faq-english.html#4.22 ) on how IN is very slow. So I rewrote the query: \o ./data/temp.txt SELECT current_timestamp; UPDATE user_account SET last_name = 'abc' WHERE EXISTS (SELECT ua.user_account_id FROM user_account ua, commercial_entity ce, commercial_service cs WHERE ua.user_account_id = ce.user_account_id AND ce.commercial_entity_id = cs.commercial_entity_id); SELECT current_timestamp; I don't think that's the query you want. You're not binding the subselect to the outer values of user_account. I think you want something like: UPDATE user_account SET last_name = 'abc' WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs WHERE user_account.user_account_id = ce.user_account_id AND ce.commercial_entity_id = cs.commercial_entity_id); ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL vs MySQL
This is a timely thread for myself, as I'm in the middle of testing both databases as an Oracle replacement. As of this moment, I know more about MySQL (tuning, setup, features) than I do about Postgres. Not because I like MySQL more, but because 1)the MySQL docs are better (sorry - I found them easier to read, and more comprehensive; I had an easier time finding the answers I needed) 2)there are more web pages devoted to MySQL (probably because it has a bit more market share) 3)there are more books on MySQL at the bookstore (I haven't had a chance to pick up Bruce's book yet; it might be all the book I'd ever need) 4)we looked at MySQL first (we needed replication, and eRServer had not been open-sourced when we started looking) With regards to #1, I'd like to specifically mention tuning - the docs at http://www.postgresql.org/docs/7.3/static/runtime-config.htmlgive a basic explanation of the different options, but much more is needed for tuning. I'm running into a problem with an update statement (that uses a select in a sub-query) in Postgres - it's taking hours to run (the equiv, using a multi-table update statement in MySQL instead of a sub-query, takes all of 2 seconds). I'll be posting it later once I do more reading to make sure I've done as much as I can to solve it myself. I really agree with this post: "I guess my point is simply this: instead of saying: "okay we use default settings that will run on _old_ hardware too" we should go for a little script that creates a "still save but much better" config file. There's just no point in setting SHARED_BUFFERS to something like 16 (what's the current default?) if the PC has = 1 GB of RAM. Setting it to 8192 would still be save, but 512 times better... ;-) (IIRC 8192 would take 64 MB of RAM, which should be save if you leave the default MAX_CONNECTIONS.)" It provides examples, and some real numbers to help someone new to the database take an initial crack at tuning. Remember, you're trying to compete with the big-guys (Oracle, etc), so providing info that an Oracle DBA needs is pretty critical. I'm currently at a complete loss for tuning Postgres (it seems to do things very differently than both Oracle and MySQL). I also have to admit a bit of irritation reading this thread; there is a fair number of incorrect statements on this thread that, while not wrong, definately aren't right: "Speed depends on the nature of use and the complexity of queries. If you are doing updates of related tables, ACID is of vital importance and MySQL doesn't provide it." MySQL has ACID in InnoDB. I've found that MySQL is actually very fast on complex queries w/InnoDB (six tables, 1 million rows, two of the joins are outer-joins. In fact, I can get InnoDB to be almost as fast as MyISAM. Complex updates are also very very fast. We have not tried flooding either database with dozens of complex statements from multiple clients; that's coming soon, and from what I've read, MySQL won't do too well. "using InnoDB tables (the only way to have foreign keys,transactions, and row level locking for MySQL) makes MySQL slower andadds complexity to tuning the database" Adding this: "innodb_flush_method=O_DSYNC" to the my.cnf made InnoDB as fast as MyISAM in our tests. It doesn't turn off disk flushing; it's just a flush method that might work better with different kernels and drives; it's one of those "play with this and see if it helps" parameters; there are lots of those in Postgres, it seems. There are 10 variables for tuning InnoDB (and you don't have to tune for MyISAM, so it's actually a six-of-one, half-dozen-of-the-other). Setup between the two seems to be about the same. "PostgreSQL supports constraints. MySQL doesn't; programmers need to take care of that from the client side" Again, InnoDB supports constraints. "Transactions: We've been here before. Suffice to say, MySQL+InnoDB is almost there. Plain ol' MySQL doesn't have it, which tells you something about their philosophy towards database design." InnoDB supports transactions very nicely, has the equivalent of WAL, and one thing I really like: a tablespace (comprised of data files that can be spread around multiple hard drives), and in a month or so, InnoDB will support multiple tablespaces. To be fair, here are a few MySQL "bad-things" that weren't mentioned: 1) InnoDB can't do a hot-backup with the basic backup tools. To hot-backup an InnoDB database, you need to pay $450 US per database per year ($1150 per database perpetual) for a proprietary hot-backup tool 2) InnoDB can't do full-text searching. 3) I see alot more corrupt-database bugs on the MySQL lists (most are MyISAM, but a few InnoDB bugs pop up from time to time) - way more than I see on the Postgres lists. 4) There are some really cranky people on the MySQL lists; the Postgres lists seem to be much more effective (esp. with people like Tom Lane). Maybe it's because they get alot of
[PERFORM] Tuning/performance issue...
We're having a problem with a query during our investigation into Postgres (as an Oracle replacement). This query Postgres takes 20-40 seconds (multiple runs). Tom Lan recommended I post it here, with an explain-analyze. Here's the query: EXPLAIN ANALYZE SELECT company_name, address_1, address_2, address_3, city,address_list.state_province_id, state_province_short_desc, country_desc, zip_code, address_list.country_id,contact_info.email, commercial_entity.user_account_id, phone_num_1, phone_num_fax, website, boats_websiteFROM commercial_entity, country, user_account,address_list LEFT JOIN state_province ON address_list.state_province_id = state_province.state_province_idLEFT JOIN contact_info ON address_list.contact_info_id = contact_info.contact_info_idWHERE address_list.address_type_id = 101AND commercial_entity.commercial_entity_id=225528AND commercial_entity.commercial_entity_id = address_list.commercial_entity_idAND address_list.country_id = country.country_idAND commercial_entity.user_account_id = user_account.user_account_idAND user_account.user_role_id IN (101, 101); Here's the explain: Nested Loop (cost=0.00..64570.33 rows=1 width=385) (actual time=42141.08..42152.06 rows=1 loops=1) - Nested Loop (cost=0.00..64567.30 rows=1 width=361) (actual time=42140.80..42151.77 rows=1 loops=1) - Nested Loop (cost=0.00..64563.97 rows=1 width=349) (actual time=42140.31..42151.27 rows=1 loops=1) Join Filter: ("outer".commercial_entity_id = "inner".commercial_entity_id) - Index Scan using commercial_entity_pkey on commercial_entity (cost=0.00..5.05 rows=1 width=94) (actual time=0.57..0.58 rows=1 loops=1) Index Cond: (commercial_entity_id = 225528::numeric) - Materialize (cost=63343.66..63343.66 rows=97221 width=255) (actual time=41741.96..41901.17 rows=90527 loops=1) - Merge Join (cost=0.00..63343.66 rows=97221 width=255) (actual time=1.44..41387.68 rows=90527 loops=1) Merge Cond: ("outer".contact_info_id = "inner".contact_info_id) - Nested Loop (cost=0.00..830457.52 rows=97221 width=222) (actual time=0.95..39178.32 rows=90527 loops=1) Join Filter: ("outer".state_province_id = "inner".state_province_id) - Index Scan using addr_list_ci_id_i on address_list (cost=0.00..586676.65 rows=97221 width=205) (actual time=0.49..2159.90 rows=90527 loops=1) Filter: (address_type_id = 101::numeric) - Seq Scan on state_province (cost=0.00..1.67 rows=67 width=17) (actual time=0.00..0.21 rows=67 loops=90527) - Index Scan using contact_info_pkey on contact_info (cost=0.00..3366.76 rows=56435 width=33) (actual time=0.44..395.75 rows=55916 loops=1) - Index Scan using user_account_pkey on user_account (cost=0.00..3.32 rows=1 width=12) (actual time=0.46..0.46 rows=1 loops=1) Index Cond: ("outer".user_account_id = user_account.user_account_id) Filter: (user_role_id = 101::numeric) - Index Scan using country_pkey on country (cost=0.00..3.01 rows=1 width=24) (actual time=0.25..0.25 rows=1 loops=1) Index Cond: ("outer".country_id = country.country_id)Total runtime: 42165.44 msec(21 rows) I will post the schema in a seperate email - the list has rejected one big email 3 times now. David
[PERFORM] Tuning/performance issue (part 2)
Here's the schema: Table "public.address_list" Column | Type | Modifiers--++---address_list_id | numeric(10,0) | not nulladdress_1 | character varying(100) |address_2 | character varying(100) |address_3 | character varying(100) |city | character varying(100) |zip_code | character varying(20) |phone_num_1 | character varying(100) |phone_num_2 | character varying(100) |phone_num_fax | character varying(100) |state_province_id | numeric(10,0) |user_account_id | numeric(10,0) |marina_id | numeric(10,0) |commercial_entity_id | numeric(10,0) |address_type_id | numeric(10,0) | not nulldistributor_id | numeric(10,0) |contact_info_id | numeric(10,0) |country_id | numeric(10,0) |lang_id | numeric(10,0) |boat_listing_id | numeric(10,0) |Indexes: address_list_pkey primary key btree (address_list_id), addr_list_addr_type_id_i btree (address_type_id), addr_list_bl_id_i btree (boat_listing_id), addr_list_bl_sp_count_i btree (boat_listing_id, state_province_id, country_id), addr_list_ce_sp_c_at_c_i btree (commercial_entity_id, state_province_id, country_id, address_type_id, city), addr_list_ce_sp_countr_addr_type_i btree (commercial_entity_id, state_province_id, country_id, address_type_id), addr_list_ci_id_i btree (contact_info_id), addr_list_comm_ent_id_i btree (commercial_entity_id), addr_list_count_lang_i btree (country_id, lang_id), addr_list_country_id_i btree (country_id), addr_list_cty_bl_count_i btree (city, boat_listing_id, country_id), addr_list_cty_i btree (city), addr_list_distrib_id_i btree (distributor_id), addr_list_marina_id_i btree (marina_id), addr_list_sp_id_i btree (state_province_id), addr_list_ua_id_i btree (user_account_id)Foreign Key constraints: $1 FOREIGN KEY (address_type_id) REFERENCES address_type(address_type_id) ON UPDATE NO ACTION ON DELETE NO ACTION, $2 FOREIGN KEY (commercial_entity_id) REFERENCES commercial_entity(commercial_entity_id) ON UPDATE NO ACTION ON DELETE NO ACTION, $3 FOREIGN KEY (contact_info_id) REFERENCES contact_info(contact_info_id) ON UPDATE NO ACTION ON DELETE NO ACTION, $4 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON UPDATE NO ACTION ON DELETE NO ACTION, $5 FOREIGN KEY (state_province_id) REFERENCES state_province(state_province_id) ON UPDATE NO ACTION ON DELETE NO ACTION Table "public.commercial_entity" Column | Type | Modifiers---+-+-commercial_entity_id | numeric(10,0) | not nullcompany_name | character varying(100) | not nullwebsite | character varying(200) |modify_date | timestamp without time zone |user_account_id | numeric(10,0) |source_id | numeric(10,0) | not nullcommercial_entity_type_id | numeric(10,0) |boats_website | character varying(200) |updated_on | timestamp without time zone | not null default ('now'::text)::timestamp(6) with time zonedealer_level_id | numeric(10,0) |lang_id | numeric(10,0) | default '100'yw_account_id | numeric(10,0) |keybank_dealer_code | numeric(10,0) |dnetaccess_id | numeric(10,0) | not null default 0interested_in_dns | numeric(10,0) | not null default 0parent_office_id | numeric(10,0) |marinesite_welcome_msg | character varying(500) |alt_marinesite_homepage | character varying(256) |comments | character varying(4000) |show_finance_yn | character varying(1) | not null default 'Y'show_insurance_yn | character varying(1) | not null default 'Y'show_shipping_yn | character varying(1) | not null default 'Y'yw_account_id_c | character varying(11) |sales_id | numeric(10,0) |Indexes: commercial_entity_pkey primary key btree (commercial_entity_id), comm_ent_boat_web_ui unique btree (boats_website), comm_ent_key_dlr_cd_ui unique btree (keybank_dealer_code), comm_ent_cny_name_i btree (company_name), comm_ent_dlr_lvl_id_i btree (dealer_level_id, lang_id), comm_ent_src_id_i btree (source_id), comm_ent_type_id_i btree (commercial_entity_type_id), comm_ent_upd_on btree (updated_on), comm_ent_usr_acc_id_i btree (user_account_id), comm_ent_yw_acc_id_i btree (yw_account_id)Foreign Key constraints: $1 FOREIGN KEY (source_id) REFERENCES source(source_id) ON UPDATE NO ACTION ON DELETE NO ACTION, $2 FOREIGN KEY (user_account_id) REFERENCES user_account(user_account_id) ON UPDATE NO ACTION ON DELETE NO ACTION Table "public.country" Column | Type | Modifiers--++---country_id | numeric(10,0) | not nulllang_id | numeric(10,0) | not nullcountry_desc | character varying(100) | not nullIndexes: country_pkey primary key btree (country_id) Table "public.user_account" Column | Type | Modifiers---+-+-user_account_id | numeric(10,0) | not
Re: [PERFORM] Tuning/performance issue...
The most efficient way to handle this query would probably be to join the three tables with restrictions first, and then join the other tables to those. You could force this with not too much rewriting using something like (untested, but I think it's right) ... FROM commercial_entity CROSS JOIN user_account CROSS JOIN address_list LEFT JOIN state_province ON address_list.state_province_id = state_province.state_province_id LEFT JOIN contact_info ON address_list.contact_info_id = contact_info.contact_info_id CROSS JOIN country WHERE ... The explicit JOINs associate left-to-right, so this gives the intended join order. (In your original query, explicit JOIN binds more tightly than commas do.) Ok - that's interesting - I'll have to do some reading and more testing. The reason PG's planner doesn't discover this join order for itself is that it's written to not attempt to re-order outer joins from the syntactically defined ordering. In general, such reordering would change the results. It is possible to analyze the query and prove that certain reorderings are valid (don't change the results), but we don't currently have code to do that. Not sure I follow. Are you saying that, depending on when the outer-join is applied to the rows found at the time, you may end up with a different set of rows? I would have expected the optimizer to do the outer-joins last, as the extra data received by the outer-joins is not mandatory, and won't affect the rows that were retreived by joining user_account, address_list, and commercial_entity. An outer join would *never* be the most restrictive join in a query. I thought (from my readings on Oracle query tuning) that finding the most restrictive table/index was the first task of an optimizer. Reduce the result set as quickly as possible. That query has the line, AND commercial_entity.commercial_entity_id=225528, which uses an index (primary key) and uses an =. I would have expected that to be done first, then joined with the other inner-join tables, and finally have the outer-joins applied to the final result set to fill in the might be there data. Anyway, if the optimizer does the outer-joins first (address_list with state_province and contact_info), then it's picking the table with the most rows (address_list has 200K+ rows, where the other 3 big tables have 70K-90K). Would re-ordering the FROM clause (and LEFT JOIN portions) help? Could you give an example where applying an outer-join at a different time could result in different results? I think I can see at situation where you use part of the results in the outer-join in the where clause, but I am not sure. I'm prepared to believe that Oracle contains code that actually does the analysis about which outer-join reorderings are valid, and is then able to find the right join order by deduction. I'm not sure about Oracle (other than what I stated above). In fact, about half the time, updating table stats to try to get the Oracle optimizer to do a better job on a query results in even worse performance. ... FROM commercial_entity CROSS JOIN country) CROSS JOIN user_account) CROSS JOIN address_list) LEFT JOIN state_province ON ...) LEFT JOIN contact_info ON ... WHERE ... This is clearly at odds with the SQL spec's syntactically defined join order semantics. It's possible that it always yields the same results as the spec requires, but I'm not at all sure about that. Again, I don't know. On the 3 queries based on these tables, Postgres and MySQL return the exact same data (they use the same data set). Do you have a link to the SQL spec's join-order requirements? In any case this strategy is certainly not better than ours, it just performs poorly on a different set of queries. Would I be out of line to speculate that your query was previously tuned to work well in MySQL? The query was pulled from our codebase (written for Oracle). I added a bit to it to make it slower, and then ported to MySQL and tested there first (just re-wrote the outer-join syntax). I found that re-ordering the tables in the from-clause on MySQL changed the time by 45-ish% (0.36 seconds to .20 seconds), but that's because I had forgotten to re-analyze the tables after refreshing the dataset. Now, table order doesn't make a difference in speed (or results). If anything, I've done more tuning for Postgres - added some extra indexes to try to help (country.country_id had a composite index with another column, but not an index for just it), etc. The dataset and schema is pure-Oracle. I extracted it out of the database, removed all Oracle-specific extensions, changed the column types, and migrated the indexes and foreign keys to MySQL and Postgres. Nothing more (other than an extra index or two for Postgres - nada for MySQL). This is all part of a migrate away from Oracle project. We are looking at 3 databases - MySQL (InnoDB), Postgres and Matisse (object oriented). We
[PERFORM] Test...
I've posted several emails, and have yet to see one show up (this one might not either). Is there a size limit to an email (it had a big analyze, and schema information)?? David