RE: query optimization question (my struggle against 'using temporary; using filesort')
Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using
Re: query optimization question (my struggle against 'using temporary; using filesort')
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :) The queries were generated by ActiveRecord (an ORM library for Ruby), although even if I had written them myself they would probably not be much better. Regards, Ciaran Lee 2009/9/24 Gavin Towey gto...@ffn.com Hi Ciaran, So I think there's a couple things going on: 1. The explain plan for your slow query looks wrong, such as mysql is confused. It's possible your index statistics are incorrect. Try ANALYZE TABLE on listings and addresses. I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order. That should get rid of the temp table and filesort operations and give faster results. SELECT STRAIGHT_JOIN listings.*, addresses.* FROM `listings` JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 2. I need to make some comments about your table design: This column is AWFUL: `addressable_type` varchar(255) DEFAULT NULL, Why have field that hold up to 255 characters and put a little string in it like Listing? Why does it matter? Well it makes your indexes disasterously bloated: KEY `index_addresses_on_parent_city_id_and_addressable_type` (`parent_city_id`,`addressable_type`), If you noticed in the explain plan, that index is 733 bytes *per row*. Especially using utf8 means each character takes 3 bytes in the index. That's terrible. That type field should be a foreign key tinyint or at the very least be a much much shorter varchar field (such as 8 or 10) You have lots of those varchar(255) fields, which looks like lazy design -- by not gathering correct requirements and designing accordingly you will hurt your database performance, waste disk space and cause yourself all kinds of future problems. 3. Why are you using OUTER JOIN? It looks to me like you're using it because you don't know the difference, since you're not looking for NULL rows or anything. In fact, it looks like mysql is smart enough to know that you've negated the OUTER JOIN by putting conditions on the joined tables in the WHERE clause, and convert then to INNER JOINS. Don't rely on that! Use the correct join type. Those queries Regards, Gavin Towey -Original Message- From: Ciaran Lee [mailto:ciaran@gmail.com] Sent: Tuesday, September 22, 2009 1:32 PM To: mysql@lists.mysql.com Subject: query optimization question (my struggle against 'using temporary; using filesort') Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref
query optimization question (my struggle against 'using temporary; using filesort')
Hi, I hope this is the right place to ask a question about query optimization. Background: I have a database which has events, which occur in places (listings). Places have addresses, and addresses belong to a city. I can select the latest event within a particular city very efficiently (less than 2ms), but selecting the latest listing within a city is REALLY slow (10-20 seconds) despite being almost a subset of the event query. I have been working on this for about a day, and have tried all sorts of tweaks to the indexes but to no avail. I always seem to end up with 'using temporary; using filesort' as the 'extra' content in the explain result. If anyone has a suggestion for what I might do to fix this, I'd really appreciate it. If not, I could further de-normalize the database for performance reasons, but I would feel dirty for doing so. Here is the fast query (select the latest event within a particular city), and it's explain. SELECT events.*, listings.*, addresses.* FROM `events` LEFT OUTER JOIN `listings` ON `listings`.id = `events`.listing_id LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY events.id DESC LIMIT 1 ++-+---++++-+--+--+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++++-+--+--+-+ | 1 | SIMPLE | events| index | index_events_on_listing_id | PRIMARY | 4 | NULL |1 | | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.events.listing_id |1 | Using where | | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | index_addresses_on_addressable_type_and_addressable_id | 773 | const, ratemyarea.listings.id |1 | Using where | ++-+---++++-+--+--+-+ Here is the slow query (select the latest listing within a particular city), and it's explain SELECT listings.*, addresses.* FROM `listings` LEFT OUTER JOIN `addresses` ON `addresses`.addressable_id = `listings`.id AND `addresses`.addressable_type = 'Listing' WHERE (addresses.parent_city_id = 3) ORDER BY listings.id DESC LIMIT 1 ++-+---+++--+-+-+---+--+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+++--+-+-+---+--+ | 1 | SIMPLE | addresses | ref| index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city | addressable_id_type_city | 773 | const,const | 25680 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | listings | eq_ref | PRIMARY | PRIMARY | 4 | ratemyarea.addresses.addressable_id | 1 | | ++-+---+++--+-+-+---+--+ Here is the