RE: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Gavin Towey
Hi Ciaran,

So I think there's a couple things going on:

1. The explain plan for your slow query looks wrong, such as mysql is 
confused.  It's possible your index statistics are incorrect.  Try ANALYZE 
TABLE  on listings and addresses.

I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order.  
That should get rid of the temp table and filesort operations and give faster 
results.

SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



2. I need to make some comments about your table design:

This column is AWFUL:
`addressable_type` varchar(255) DEFAULT NULL,

Why have field that hold up to 255 characters and put a little string in it 
like Listing? Why does it matter?  Well it makes your indexes disasterously 
bloated:

KEY `index_addresses_on_parent_city_id_and_addressable_type`
(`parent_city_id`,`addressable_type`),


If you noticed in the explain plan, that index is 733 bytes *per row*.  
Especially using utf8 means each character takes 3 bytes in the index.  That's 
terrible. That type field should be a foreign key tinyint or at the very least 
be a much much shorter varchar field (such as 8 or 10)

You have lots of those varchar(255) fields, which looks like lazy design -- by 
not gathering correct requirements and designing accordingly you will hurt your 
database performance, waste disk space and cause yourself all kinds of future 
problems.

3.  Why are you using OUTER JOIN?

It looks to me like you're using it because you don't know the difference, 
since you're not looking for NULL rows or anything.  In fact, it looks like 
mysql is smart enough to know that you've negated the OUTER JOIN by putting 
conditions on the joined tables in the WHERE clause, and convert then to INNER 
JOINS.  Don't rely on that!  Use the correct join type.

Those queries
Regards,
Gavin Towey

-Original Message-
From: Ciaran Lee [mailto:ciaran@gmail.com]
Sent: Tuesday, September 22, 2009 1:32 PM
To: mysql@lists.mysql.com
Subject: query optimization question (my struggle against 'using temporary; 
using filesort')

Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using

Re: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Ciaran Lee
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as
possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :)
The queries were generated by ActiveRecord (an ORM library for Ruby),
although even if I had written them myself they would probably not be much
better.
Regards,
Ciaran Lee

2009/9/24 Gavin Towey gto...@ffn.com

 Hi Ciaran,

 So I think there's a couple things going on:

 1. The explain plan for your slow query looks wrong, such as mysql is
 confused.  It's possible your index statistics are incorrect.  Try ANALYZE
 TABLE  on listings and addresses.

 I think a sure way to fix it is to add STRAIGHT_JOIN to force the join
 order.  That should get rid of the temp table and filesort operations and
 give faster results.

 SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
 JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



 2. I need to make some comments about your table design:

 This column is AWFUL:
 `addressable_type` varchar(255) DEFAULT NULL,

 Why have field that hold up to 255 characters and put a little string in it
 like Listing? Why does it matter?  Well it makes your indexes
 disasterously bloated:

 KEY `index_addresses_on_parent_city_id_and_addressable_type`
 (`parent_city_id`,`addressable_type`),


 If you noticed in the explain plan, that index is 733 bytes *per row*.
  Especially using utf8 means each character takes 3 bytes in the index.
  That's terrible. That type field should be a foreign key tinyint or at the
 very least be a much much shorter varchar field (such as 8 or 10)

 You have lots of those varchar(255) fields, which looks like lazy design --
 by not gathering correct requirements and designing accordingly you will
 hurt your database performance, waste disk space and cause yourself all
 kinds of future problems.

 3.  Why are you using OUTER JOIN?

 It looks to me like you're using it because you don't know the difference,
 since you're not looking for NULL rows or anything.  In fact, it looks like
 mysql is smart enough to know that you've negated the OUTER JOIN by putting
 conditions on the joined tables in the WHERE clause, and convert then to
 INNER JOINS.  Don't rely on that!  Use the correct join type.

 Those queries
 Regards,
 Gavin Towey

 -Original Message-
 From: Ciaran Lee [mailto:ciaran@gmail.com]
 Sent: Tuesday, September 22, 2009 1:32 PM
 To: mysql@lists.mysql.com
 Subject: query optimization question (my struggle against 'using temporary;
 using filesort')

 Hi,

 I hope this is the right place to ask a question about query optimization.

 Background:
 I have a database which has events, which occur in places (listings).
 Places
 have addresses, and addresses belong to a city. I can select the latest
 event within a particular city very efficiently (less than 2ms), but
 selecting the latest listing within a city is REALLY slow (10-20 seconds)
 despite being almost a subset of the event query.

 I have been working on this for about a day, and have tried all sorts of
 tweaks to the indexes but to no avail. I always seem to end up with 'using
 temporary; using filesort' as the 'extra' content in the explain result. If
 anyone has a suggestion for what I might do to fix this, I'd really
 appreciate it. If not, I could further de-normalize the database for
 performance reasons, but I would feel dirty for doing so.

 Here is the fast query (select the latest event within a particular city),
 and it's explain.
 SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1


  
 ++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |

  
 ++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref

query optimization question (my struggle against 'using temporary; using filesort')

2009-09-22 Thread Ciaran Lee
Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using where |
  
++-+---++++-+--+--+-+


Here is the slow query (select the latest listing within a particular city),
and it's explain
SELECT
  listings.*, addresses.*
  FROM
`listings`
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1

  
++-+---+++--+-+-+---+--+
  | id | select_type | table | type   | possible_keys

   | key  | key_len | ref
  | rows  | Extra
 |
  
++-+---+++--+-+-+---+--+
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| addressable_id_type_city | 773 | const,const |
25680 | Using where; Using temporary; Using filesort |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY  | 4   |
ratemyarea.addresses.addressable_id | 1 |
   |
  
++-+---+++--+-+-+---+--+


Here is the 

Re: Query Optimization Question

2006-03-14 Thread Michael Stassen

[EMAIL PROTECTED] wrote:
Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:


This query should be slow due to the table scan it takes to test the 
condition:


SELECT ...
FROM ...
WHERE indexfield  16


This query will use the index if the number of rows with indexfield != 16 is 
small enough ( about 30%).


This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):


(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)


At the very least, you'll want to make that UNION ALL, rather than just UNION, 
so mysql doesn't waste time looking for duplicates to remove.  But even then, 
this may be slower.


First, these two unioned queries may still not use the index.  If the number of 
rows with indexfield  16 is too large (more than about 30%) the first query 
will be a table scan.  If the number of rows with indexfield  16 is too large 
(more than about 30%) the second query will be a table scan.  In fact, if the 
number of rows with indexfield = 16 is less than about 40% of the table, then at 
least one of the two unioned queries is guaranteed to be a table scan.  Worse 
yet, this query stands a good chance of being twice as long as the single, != 
query, because it may require 2 table scans!


Second, even if both unioned queries use the index, the result still may not be 
faster.  If the combined number of matching rows is too high, the full table 
scan should be faster than the indexed lookups.


For example:

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | cat  | COUNT(*) |
  +--+--+
  |0 | 5743 |
  |1 | 3792 |
  |2 |30727 |
  |3 | 1926 |
  |4 | 7812 |
  +--+--+

19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side.

First, the != case:

  EXPLAIN SELECT * FROM inits WHERE cat != 2 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: ALL
  possible_keys: cat_idx
key: NULL
key_len: NULL
ref: NULL
   rows: 5
  Extra: Using where

As expected, a full table scan.

  SELECT * FROM inits WHERE cat != 2;
  ...
  19273 rows in set (0.37 sec)

Now, the unioned range queries:

  EXPLAIN SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2 \G
  *** 1. row ***
 id: 1
select_type: PRIMARY
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 5680
  Extra: Using where
  *** 2. row ***
 id: 2
select_type: UNION
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 6543
  Extra: Using where
  *** 3. row ***
 id: NULL
select_type: UNION RESULT
  table: union1,2
   type: ALL
  possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
   rows: NULL
  Extra:


As hoped, mysql plans to use the index for each query.

  SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2;
  ...
  19273 rows in set (0.78 sec)

Despite (because of) using the index, this takes more than twice as long!

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.


Even when using an index is possible, it is not always desirable.  I'd suggest 
not trying to outfox the optimizer until you've first determined it is making 
bad choices, and then test to make sure the solution is actually an improvement.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Optimization Question

2006-03-13 Thread SGreen
Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM:

 In a previous database engine I was using an IN was more optimal than a
 . So, for example:
 
 SELECT * FROM table WHERE table.type IN (1,2,3);
 
 Where the possible values of type are 0-3, was appreciably faster than:
 
 SELECT * FROM table WHERE table.type  0;
 
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
 
 TIA,
 
 R.
 
 

YES, YES, YES! This is definitely an optimization. 

When you say IN or =, you are asking for matching values. Matches can 
come from indexes. When you say  or NOT IN, you are asking for 
everything BUT matches.  In order to evaluate a negative, the database 
engine (and this is usually true regardless of database server) almost 
always performs a full table scan to test every row to make sure it is 
either  or NOT IN. At the very best, they have to perform a full index 
scan which is still less efficient than  ranged or values-based lookups.

It's when you get into the situation where you are matching against dozens 
of IN-clause items that you may run into slowdowns again. Until you reach 
2 or 3 dozen terms (depending on your hardware) you should be faster with 
an IN comparison than a  or a NOT IN comparison. An optimization to 
search for BUT a term or two is to create a temporary table of all of your 
terms and delete the exact ones you want to exclude. Put an index on yoru 
temp table then JOIN that back into your query again (replacing the huge 
IN clause).  The database will match index to index and things will get 
fast again. This technique can scale up to some really big queries.

Always try to code for the affirmative tests. Your users will thank you.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco
Shawn,
 
Any performance gains for specifying type  0 than type  0 ?
 
R.



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 6:37 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: Re: Query Optimization Question




Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
PM:

 In a previous database engine I was using an IN was more optimal than
a
 . So, for example:
  
 SELECT * FROM table WHERE table.type IN (1,2,3);
  
 Where the possible values of type are 0-3, was appreciably faster
than:
  
 SELECT * FROM table WHERE table.type  0;
  
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
  
 TIA,
  
 R.
  
 

YES, YES, YES! This is definitely an optimization.   

When you say IN or =, you are asking for matching values. Matches can
come from indexes. When you say  or NOT IN, you are asking for
everything BUT matches.  In order to evaluate a negative, the database
engine (and this is usually true regardless of database server) almost
always performs a full table scan to test every row to make sure it is
either  or NOT IN. At the very best, they have to perform a full index
scan which is still less efficient than  ranged or values-based lookups.


It's when you get into the situation where you are matching against
dozens of IN-clause items that you may run into slowdowns again. Until
you reach  2 or 3 dozen terms (depending on your hardware) you should be
faster with an IN comparison than a  or a NOT IN comparison. An
optimization to search for BUT a term or two is to create a temporary
table of all of your terms and delete the exact ones you want to
exclude. Put an index on yoru temp table then JOIN that back into your
query again (replacing the huge IN clause).  The database will match
index to index and things will get fast again. This technique can scale
up to some really big queries. 

Always try to code for the affirmative tests. Your users will thank you.


Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Query Optimization Question

2006-03-13 Thread SGreen
Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:

This query should be slow due to the table scan it takes to test the 
condition:

SELECT ...
FROM ...
WHERE indexfield  16

This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):

(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM:

 Shawn,
 
 Any performance gains for specifying type  0 than type  0 ?
 
 R.
 
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question
 
 
 
 
 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:
 
  In a previous database engine I was using an IN was more optimal than
 a
  . So, for example:
  
  SELECT * FROM table WHERE table.type IN (1,2,3);
  
  Where the possible values of type are 0-3, was appreciably faster
 than:
  
  SELECT * FROM table WHERE table.type  0;
  
  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.
  
  TIA,
  
  R.
  
  
 
 YES, YES, YES! This is definitely an optimization. 
 
 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.
 
 
 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can scale
 up to some really big queries. 
 
 Always try to code for the affirmative tests. Your users will thank you.
 
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine
 


RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco
Interesting, that seems like an optimization the query optimizer could
do itself when it sees a  operator on a indexed numeric.



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 8:01 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: RE: Query Optimization Question



Yes, a ranged query should respond faster than a negation. In some cases
you can seriously improve query performance for a negation query if you
split it into two range queries unioned together. Here is a pseudo
example: 

This query should be slow due to the table scan it takes to test the
condition: 

SELECT ... 
FROM ... 
WHERE indexfield  16 

This query should be noticeably faster due to using ranged queries
(partial index ranged matches): 

(SELECT ... 
FROM ... 
WHERE indexfield  16 
) UNION ( 
SELECT ... 
FROM ... 
WHERE indexfield  16 
) 

Of course, if you have to do a table scan ANYWAY (because a value you
have in a constraint is not in an index) then this won't help. This
optimization is all about getting the engine to use an index whenever it
can instead of performing a table scan. Of course, that is not always
possible. 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine 

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29
AM:

 Shawn,
  
 Any performance gains for specifying type  0 than type  0 ?
  
 R.
 
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question
 
 
 
 
 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:
 
  In a previous database engine I was using an IN was more optimal
than
 a
  . So, for example:
   
  SELECT * FROM table WHERE table.type IN (1,2,3);
   
  Where the possible values of type are 0-3, was appreciably faster
 than:
   
  SELECT * FROM table WHERE table.type  0;
   
  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.
   
  TIA,
   
  R.
   
  
 
 YES, YES, YES! This is definitely an optimization.   
 
 When you say IN or =, you are asking for matching values. Matches
can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full
index
 scan which is still less efficient than  ranged or values-based
lookups.
 
 
 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should
be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can
scale
 up to some really big queries. 
 
 Always try to code for the affirmative tests. Your users will thank
you.
 
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine
 



Re: Query Optimization Question

2006-03-13 Thread Michael Stassen

Robert DiFalco wrote:
 In a previous database engine I was using an IN was more optimal than a
 . So, for example:

 SELECT * FROM table WHERE table.type IN (1,2,3);

 Where the possible values of type are 0-3, was appreciably faster than:

 SELECT * FROM table WHERE table.type  0;

 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.

Mladen Adamovic wrote:
 IN should be faster implemented with both hash tables and BTREE's so
 nowadays it should also be faster than  as all MySQL implementation
 AFAIK use those well known data structures for indexes.

[EMAIL PROTECTED] wrote:
 YES, YES, YES! This is definitely an optimization.

 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.

 It's when you get into the situation where you are matching against dozens
 of IN-clause items that you may run into slowdowns again. Until you reach
 2 or 3 dozen terms (depending on your hardware) you should be faster with
 an IN comparison than a  or a NOT IN comparison. An optimization to
 search for BUT a term or two is to create a temporary table of all of your
 terms and delete the exact ones you want to exclude. Put an index on your
 temp table then JOIN that back into your query again (replacing the huge
 IN clause).  The database will match index to index and things will get
 fast again. This technique can scale up to some really big queries.

 Always try to code for the affirmative tests. Your users will thank you.

Implicit in Mladen and Shawn's answers, but never actually mentioned in the 
original post, is the presence of an index on the type column.  This is probably 
obvious to all concerned, but I mention it for completeness: without an index on 
type, there is no difference between type IN (1,2,3) and type != 0.  That 
is, the question is not whether IN is better than !=, but rather which will 
allow the optimizer to make good use of the index on type.


I find mysql's optimizer is pretty good with well-written queries, as long as 
subqueries aren't involved, so my initial reaction was to expect no difference. 
 After all, as the optimizer considers the WHERE conditions and the available 
indexes, it is certainly possible, at least theoretically, for it to notice that 
type IN (1,2,3) and type != 0 are identical conditions.  That is, a clever 
optimizer could treat them identically.  Shawn's and Mladen's answers gave me 
pause, however, and aroused my curiosity, so I decided to test:


  SELECT VERSION();
  +---+
  | VERSION() |
  +---+
  | 4.1.15|
  +---+

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | type | COUNT(*) |
  +--+--+
  |0 |44224 |
  |1 | 1919 |
  |2 | 1931 |
  |3 | 1926 |
  +--+--+

  mysql EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8117
  Extra: Using where

  mysql EXPLAIN SELECT * FROM inits WHERE cat != 0 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8120
  Extra: Using where

As you can see, the optimizer plans to use the index in both cases, examining 
8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows 
out of 50,000 (about 11.55%).


On the other hand, it makes a difference how many rows will match.  What is the 
distribution of values of type?  If the number of matching rows is more than 
about 30% of the table, the optimizer won't use an available index in any case. 
 For example,


mysql EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: inits
 type: ALL
possible_keys: cat_idx
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 5
Extra: Using where

mysql EXPLAIN SELECT * FROM inits WHERE cat !=3 \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: 

Query Optimization Question

2006-03-11 Thread Robert DiFalco
In a previous database engine I was using an IN was more optimal than a
. So, for example:
 
SELECT * FROM table WHERE table.type IN (1,2,3);
 
Where the possible values of type are 0-3, was appreciably faster than:
 
SELECT * FROM table WHERE table.type  0;
 
I've been playing with the Query Browser and checking out the
optimization documents and haven't been able to make a clear call on
whether or not this is also the case with MySQL/InnoDB.
 
TIA,
 
R.
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Optimization Question

2006-03-11 Thread Mladen Adamovic

Robert DiFalco wrote:

In a previous database engine I was using an IN was more optimal than a
. So, for example:
 
SELECT * FROM table WHERE table.type IN (1,2,3);
 
Where the possible values of type are 0-3, was appreciably faster than:
 
SELECT * FROM table WHERE table.type  0;
  
IN should be faster implemented with both hash tables and BTREE's so 
nowadays it should also be faster than  as all MySQL implementation 
AFAIK use those well known data structures for indexes.



 
I've been playing with the Query Browser and checking out the

optimization documents and haven't been able to make a clear call on
whether or not this is also the case with MySQL/InnoDB.
 
TIA,
 
R.
 



  


--
Mladen Adamovic
http://home.blic.net/adamm
http://www.shortopedia.com 
http://www.froola.com 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Query optimization question

2004-10-04 Thread Gerald Taylor
Query optimization question
I am  selecting from a single table  but it has a lot of rows and it has 
a very involved calculation.  What I really want to do is
is FIRST restrict the number of rows so that the big calculation is only
performed on the ones that are within 3 degrees.

Using 4.0.20
A sample query  is given here:
The application interpolates variable values such as 44.6 into
the query string, so from mysql's
point of view they are constants, right?  And the explain doc
says it optimizes constants, but  it is looking at all the rows
and I see why.
SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM londata
WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  ORDER 
BY distance;

I guess I can't do a subselect with my version...
If I could what would it look like?
Something like below?   (I might be able to talk
the powers that be into an upgrade.)  And if I can't
is it more horrible to manually create a temporary table
and perform the calculations on it  than it is to
just do what I am doing?
SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM (SELECT * FROM  londata
   WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) 
as
   sublon  ORDER BY distance;

Thanks.
GT
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query optimization question

2004-10-04 Thread gerald_clark

Gerald Taylor wrote:
Query optimization question
I am  selecting from a single table  but it has a lot of rows and it 
has a very involved calculation.  What I really want to do is
is FIRST restrict the number of rows so that the big calculation is only
performed on the ones that are within 3 degrees.

Using 4.0.20
A sample query  is given here:
The application interpolates variable values such as 44.6 into
the query string, so from mysql's
point of view they are constants, right?  And the explain doc
says it optimizes constants, but  it is looking at all the rows
and I see why.
SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM londata
WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  
ORDER BY distance;

I guess I can't do a subselect with my version...
If I could what would it look like?
Something like below?   (I might be able to talk
the powers that be into an upgrade.)  And if I can't
is it more horrible to manually create a temporary table
and perform the calculations on it  than it is to
just do what I am doing?
SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM (SELECT * FROM  londata
   WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as
   sublon  ORDER BY distance;
Thanks.
GT

A Subselect won't help. You still have to scan the whole table to the 
calculation for the where clause.
Pre calculate your min and max lat. and lon.
WHERE  latitude = @maxlat and latitude = @minlat and longitude = 
@maxlon and longitude = @minlon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query optimization question

2004-10-04 Thread SGreen
When you don't have subselects, you have two options: temporary tables or 
JOINed queries.In your case, I think the temporary table is the better way 
to go. 

I would also eliminate the ABS() check so that I can compare values 
directly against the index. I know the math is correct your way but this 
way you are comparing values directly against the column which means that 
indexes can come into play.


SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3

CREATE TEMPORARY TABLE tmpDeltaData
SELECT city, state, country, latitude, longitude
FROM Londata
WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta)
AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + 
@Delta)


SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) *
 (SIN(RADIANS(latitude))) +
 (COS(RADIANS(@targetLat))) *
 (COS(RADIANS(latitude))) *
 (COS(RADIANS(@targetLon -longitude)
  * 111),) as distance 
FROM tmpDeltaData
ORDER BY distance

DROP TEMPORARY TABLE tmpDeltaData

I would also test the WHERE clause from tmpDeltaData with your original 
query to compare speeds of the two methods.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Gerald Taylor [EMAIL PROTECTED] wrote on 10/04/2004 10:33:22 AM:

 Query optimization question
 
 I am  selecting from a single table  but it has a lot of rows and it has 

 a very involved calculation.  What I really want to do is
 is FIRST restrict the number of rows so that the big calculation is only
 performed on the ones that are within 3 degrees.
 
 Using 4.0.20
 
 A sample query  is given here:
 The application interpolates variable values such as 44.6 into
 the query string, so from mysql's
 point of view they are constants, right?  And the explain doc
 says it optimizes constants, but  it is looking at all the rows
 and I see why.
 
 SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  ORDER 

 BY distance;
 
 
 I guess I can't do a subselect with my version...
 If I could what would it look like?
 Something like below?   (I might be able to talk
 the powers that be into an upgrade.)  And if I can't
 is it more horrible to manually create a temporary table
 and perform the calculations on it  than it is to
 just do what I am doing?
 
 SELECT city, state, country, latitude, longitude,
   IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
   ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM (SELECT * FROM  londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) 
 as
 sublon  ORDER BY distance;
 
 Thanks.
 
 GT
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]