RE: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Gavin Towey
Hi Ciaran,

So I think there's a couple things going on:

1. The explain plan for your slow query looks wrong, such as mysql is 
confused.  It's possible your index statistics are incorrect.  Try ANALYZE 
TABLE  on listings and addresses.

I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order.  
That should get rid of the temp table and filesort operations and give faster 
results.

SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



2. I need to make some comments about your table design:

This column is AWFUL:
`addressable_type` varchar(255) DEFAULT NULL,

Why have field that hold up to 255 characters and put a little string in it 
like Listing? Why does it matter?  Well it makes your indexes disasterously 
bloated:

KEY `index_addresses_on_parent_city_id_and_addressable_type`
(`parent_city_id`,`addressable_type`),


If you noticed in the explain plan, that index is 733 bytes *per row*.  
Especially using utf8 means each character takes 3 bytes in the index.  That's 
terrible. That type field should be a foreign key tinyint or at the very least 
be a much much shorter varchar field (such as 8 or 10)

You have lots of those varchar(255) fields, which looks like lazy design -- by 
not gathering correct requirements and designing accordingly you will hurt your 
database performance, waste disk space and cause yourself all kinds of future 
problems.

3.  Why are you using OUTER JOIN?

It looks to me like you're using it because you don't know the difference, 
since you're not looking for NULL rows or anything.  In fact, it looks like 
mysql is smart enough to know that you've negated the OUTER JOIN by putting 
conditions on the joined tables in the WHERE clause, and convert then to INNER 
JOINS.  Don't rely on that!  Use the correct join type.

Those queries
Regards,
Gavin Towey

-Original Message-
From: Ciaran Lee [mailto:ciaran@gmail.com]
Sent: Tuesday, September 22, 2009 1:32 PM
To: mysql@lists.mysql.com
Subject: query optimization question (my struggle against 'using temporary; 
using filesort')

Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using

Re: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Ciaran Lee
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as
possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :)
The queries were generated by ActiveRecord (an ORM library for Ruby),
although even if I had written them myself they would probably not be much
better.
Regards,
Ciaran Lee

2009/9/24 Gavin Towey gto...@ffn.com

 Hi Ciaran,

 So I think there's a couple things going on:

 1. The explain plan for your slow query looks wrong, such as mysql is
 confused.  It's possible your index statistics are incorrect.  Try ANALYZE
 TABLE  on listings and addresses.

 I think a sure way to fix it is to add STRAIGHT_JOIN to force the join
 order.  That should get rid of the temp table and filesort operations and
 give faster results.

 SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
 JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



 2. I need to make some comments about your table design:

 This column is AWFUL:
 `addressable_type` varchar(255) DEFAULT NULL,

 Why have field that hold up to 255 characters and put a little string in it
 like Listing? Why does it matter?  Well it makes your indexes
 disasterously bloated:

 KEY `index_addresses_on_parent_city_id_and_addressable_type`
 (`parent_city_id`,`addressable_type`),


 If you noticed in the explain plan, that index is 733 bytes *per row*.
  Especially using utf8 means each character takes 3 bytes in the index.
  That's terrible. That type field should be a foreign key tinyint or at the
 very least be a much much shorter varchar field (such as 8 or 10)

 You have lots of those varchar(255) fields, which looks like lazy design --
 by not gathering correct requirements and designing accordingly you will
 hurt your database performance, waste disk space and cause yourself all
 kinds of future problems.

 3.  Why are you using OUTER JOIN?

 It looks to me like you're using it because you don't know the difference,
 since you're not looking for NULL rows or anything.  In fact, it looks like
 mysql is smart enough to know that you've negated the OUTER JOIN by putting
 conditions on the joined tables in the WHERE clause, and convert then to
 INNER JOINS.  Don't rely on that!  Use the correct join type.

 Those queries
 Regards,
 Gavin Towey

 -Original Message-
 From: Ciaran Lee [mailto:ciaran@gmail.com]
 Sent: Tuesday, September 22, 2009 1:32 PM
 To: mysql@lists.mysql.com
 Subject: query optimization question (my struggle against 'using temporary;
 using filesort')

 Hi,

 I hope this is the right place to ask a question about query optimization.

 Background:
 I have a database which has events, which occur in places (listings).
 Places
 have addresses, and addresses belong to a city. I can select the latest
 event within a particular city very efficiently (less than 2ms), but
 selecting the latest listing within a city is REALLY slow (10-20 seconds)
 despite being almost a subset of the event query.

 I have been working on this for about a day, and have tried all sorts of
 tweaks to the indexes but to no avail. I always seem to end up with 'using
 temporary; using filesort' as the 'extra' content in the explain result. If
 anyone has a suggestion for what I might do to fix this, I'd really
 appreciate it. If not, I could further de-normalize the database for
 performance reasons, but I would feel dirty for doing so.

 Here is the fast query (select the latest event within a particular city),
 and it's explain.
 SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1


  
 ++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |

  
 ++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref

query optimization question (my struggle against 'using temporary; using filesort')

2009-09-22 Thread Ciaran Lee
Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using where |
  
++-+---++++-+--+--+-+


Here is the slow query (select the latest listing within a particular city),
and it's explain
SELECT
  listings.*, addresses.*
  FROM
`listings`
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1

  
++-+---+++--+-+-+---+--+
  | id | select_type | table | type   | possible_keys

   | key  | key_len | ref
  | rows  | Extra
 |
  
++-+---+++--+-+-+---+--+
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| addressable_id_type_city | 773 | const,const |
25680 | Using where; Using temporary; Using filesort |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY  | 4   |
ratemyarea.addresses.addressable_id | 1 |
   |
  
++-+---+++--+-+-+---+--+


Here is the