Re: Query Optimization

2012-11-16 Thread Benaya Paul
SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); query will make it
faster, if the field is ENUM


On Fri, Nov 16, 2012 at 12:36 AM, Anupam Karmarkar
sb_akarmar...@yahoo.comwrote:

 Hi All,

 Consider a scenario, I have table XYZ which contains value follow
 BLUE
 RED
 GREEN
 NULL

 following are queries we can use get this values

 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN');
 2. SELECT * FROM XYZ WHERE VAL IS NOT NULL
 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR VAL='RED' OR VAL='GREEN'
 and more

 So which one is good in terms of optimization. I guess, 1 and 3 are
 similar in term of formation.


 --Anupam




-- 
Thanks  Regards,
P.Benaya Paul

http://www.codeasearch.com
http://www.iwannasearch.com


RE: Query Optimization

2012-11-16 Thread Rick James
It depends on the distribution of the 4 'values' in that field.
If the cardinality is poor, then INDEX(VAL) won't be used, and they will all do 
a table scan.

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Friday, November 16, 2012 12:36 AM
 To: mysql@lists.mysql.com
 Subject: Query Optimization
 
 Hi All,
 
 Consider a scenario, I have table XYZ which contains value follow BLUE RED
 GREEN NULL
 
 following are queries we can use get this values
 
 1. SELECT * FROM XYX WHERE VAL IN ('BLUE','RED','GREEN'); 2. SELECT * FROM
 XYZ WHERE VAL IS NOT NULL 3. SELECT * FROM XYZ WHERE VAL = 'BLUE' OR
 VAL='RED' OR VAL='GREEN'
 and more
 
 So which one is good in terms of optimization. I guess, 1 and 3 are similar
 in term of formation.
 
 
 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



query optimization

2011-09-22 Thread supr_star


 I have a table with 24 million rows, I need to figure out how to optimize a 
query.  It has to do with mac addresses and radius packets - I want to see the 
# of connections and the min/max date. So I basically want all this data:

  select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num) 
recn 
  from radiuscap 
  where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY) 
    and r3_type='Access' 
  group by cpe_mac order by cpe_mac
;

This piece of the query takes 30 seconds to run and produces 3500 rows.  I have 
r3_dt indexed.  I also want a status field of the row with the highest r3_dt:

select rec_num,cpe_mac,req_status 
from rad_r3cap
where r3_type='Access'
  and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
;

This piece of the query takes forever,  I let it run for an hour and it still 
didn't finish, it's obviously not using indexes.  I have no idea how far along 
it got.  I wrote a php script to run the 1st query, then do 3500 individual 
lookups for the status using the max(rec_num) field in the 1st query, and I can 
get the data in 31 seconds.  So I CAN produce this data, but very slowly, and 
not in 1 sql query.  I want to consolidate this into 1 sql so I can make a view.

If anyone can point me in the right direction, I'd appreciate it!



mysql desc rad_r3cap;
+-+-+--+-+-++
| Field       | Type        | Null | Key | Default | Extra          |
+-+-+--+-+-++
| rec_num     | int(11)     | NO   | PRI | NULL    | auto_increment |
| r3_dt       | datetime    | YES  | MUL | NULL    |                |
| r3_micros   | int(11)     | YES  |     | NULL    |                |
| r3_type     | varchar(16) | YES  |     | NULL    |                |
| req_status  | varchar(16) | YES  |     | NULL    |                |
| req_comment | varchar(64) | YES  |     | NULL    |                |
| asn_ip      | varchar(16) | YES  | MUL | NULL    |                |
| asn_name    | varchar(16) | YES  |     | NULL    |                |
| bsid        | varchar(12) | YES  | MUL | NULL    |                |
| cpe_ip      | varchar(16) | YES  |     | NULL    |                |
| cpe_mac     | varchar(12) | YES  | MUL | NULL    |                |
| filename    | varchar(32) | YES  |     | NULL    |                |
| linenum     | int(11)     | YES  |     | NULL    |                |
| r3_hour     | datetime    | YES  | MUL | NULL    |                |
| user_name   | varchar(64) | YES  |     | NULL    |                |
+-+-+--+-+-++

mysql show indexes in rad_r3cap;
+---++--+--+-+---+-+--++--++-+
| Table     | Non_unique | Key_name     | Seq_in_index | Column_name | 
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++--+--+-+---+-+--++--++-+
| rad_r3cap |          0 | PRIMARY      |            1 | rec_num     | A        
 |    23877677 |     NULL | NULL   |      | BTREE      |         |
| rad_r3cap |          0 | r3cap_dt     |            1 | r3_dt       | A        
 |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          0 | r3cap_dt     |            2 | r3_micros   | A        
 |        NULL |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_bsid   |            1 | bsid        | A        
 |         346 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_asnip  |            1 | asn_ip      | A        
 |          55 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_cpemac |            1 | cpe_mac     | A        
 |        4758 |     NULL | NULL   | YES  | BTREE      |         |
| rad_r3cap |          1 | r3cap_date   |            1 | r3_hour     | A        
 |        1548 |     NULL | NULL   | YES  | BTREE      |         |
+---++--+--+-+---+-+--++--++-+
7 rows in set (0.00 sec)


Re: query optimization

2011-09-22 Thread Ananda Kumar
Your outer query select cpe_mac,max(r3_dt) from rad_r3cap, is doing a full
table scan, you might want to check on this and use a WHERE condition to
use indexed column

On Fri, Sep 23, 2011 at 12:14 AM, supr_star suprstar1...@yahoo.com wrote:



  I have a table with 24 million rows, I need to figure out how to optimize
 a query.  It has to do with mac addresses and radius packets - I want to see
 the # of connections and the min/max date. So I basically want all this
 data:

   select cpe_mac,count(*) c,min(r3_dt) mindt,max(r3_dt) maxdt, max(rec_num)
 recn
   from radiuscap
   where r3_dt=SUBDATE(NOW(),INTERVAL 30 DAY)
 and r3_type='Access'
   group by cpe_mac order by cpe_mac
 ;

 This piece of the query takes 30 seconds to run and produces 3500 rows.  I
 have r3_dt indexed.  I also want a status field of the row with the highest
 r3_dt:

 select rec_num,cpe_mac,req_status
 from rad_r3cap
 where r3_type='Access'
   and (cpe_mac,r3_dt) in (select cpe_mac,max(r3_dt) from rad_r3cap)
 ;

 This piece of the query takes forever,  I let it run for an hour and it
 still didn't finish, it's obviously not using indexes.  I have no idea how
 far along it got.  I wrote a php script to run the 1st query, then do 3500
 individual lookups for the status using the max(rec_num) field in the 1st
 query, and I can get the data in 31 seconds.  So I CAN produce this data,
 but very slowly, and not in 1 sql query.  I want to consolidate this into 1
 sql so I can make a view.

 If anyone can point me in the right direction, I'd appreciate it!



 mysql desc rad_r3cap;
 +-+-+--+-+-++
 | Field   | Type| Null | Key | Default | Extra  |
 +-+-+--+-+-++
 | rec_num | int(11) | NO   | PRI | NULL| auto_increment |
 | r3_dt   | datetime| YES  | MUL | NULL||
 | r3_micros   | int(11) | YES  | | NULL||
 | r3_type | varchar(16) | YES  | | NULL||
 | req_status  | varchar(16) | YES  | | NULL||
 | req_comment | varchar(64) | YES  | | NULL||
 | asn_ip  | varchar(16) | YES  | MUL | NULL||
 | asn_name| varchar(16) | YES  | | NULL||
 | bsid| varchar(12) | YES  | MUL | NULL||
 | cpe_ip  | varchar(16) | YES  | | NULL||
 | cpe_mac | varchar(12) | YES  | MUL | NULL||
 | filename| varchar(32) | YES  | | NULL||
 | linenum | int(11) | YES  | | NULL||
 | r3_hour | datetime| YES  | MUL | NULL||
 | user_name   | varchar(64) | YES  | | NULL||
 +-+-+--+-+-++

 mysql show indexes in rad_r3cap;

 +---++--+--+-+---+-+--++--++-+
 | Table | Non_unique | Key_name | Seq_in_index | Column_name |
 Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |

 +---++--+--+-+---+-+--++--++-+
 | rad_r3cap |  0 | PRIMARY  |1 | rec_num | A
   |23877677 | NULL | NULL   |  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |1 | r3_dt   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  0 | r3cap_dt |2 | r3_micros   | A
   |NULL | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_bsid   |1 | bsid| A
   | 346 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_asnip  |1 | asn_ip  | A
   |  55 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_cpemac |1 | cpe_mac | A
   |4758 | NULL | NULL   | YES  | BTREE  | |
 | rad_r3cap |  1 | r3cap_date   |1 | r3_hour | A
   |1548 | NULL | NULL   | YES  | BTREE  | |

 +---++--+--+-+---+-+--++--++-+
 7 rows in set (0.00 sec)



Re: Query Optimization

2011-09-08 Thread Brandon Phelps

On 09/01/2011 01:32 PM, Brandon Phelps wrote:

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
++-+---++---+--+-++--+-+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 | NULL | 
10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port | 1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port | 1 | |

Re: Query Optimization

2011-09-08 Thread Andrew Moore
Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelps bphe...@gls.com wrote:

 On 09/01/2011 01:32 PM, Brandon Phelps wrote:

 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

 On 9/1/2011 09:42, Brandon Phelps wrote:

 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.

 Any other ideas?


 I believe Jochem was on the right track but he got his dates reversed.

 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start = (ending time) and end = (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
 AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+

 EXPLAIN output for new method with new index:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---+--**--+---**
+--+-+**+--+--**---+
| id | select_type | table | type | possible_keys | key | key_len | ref |
rows | Extra |
++-+---+--**--+---**
+--+-+**+--+--**---+
| 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
NULL | 10 | Using where |
| 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
1 | |
| 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
1 | |
++-+---+--**--+---**
+--+-+**+--+--**---+

EXPLAIN output for new method with new index:
++-+---+--**--+---**

Re: Query Optimization

2011-09-08 Thread Andrew Moore
I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelps bphe...@gls.com wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.



 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be
 part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window -
 include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.

 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause

 WHERE start= (ending time) and end= (starting time)

 Try that and let us know the results.


 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for
 you)

 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;

 EXPLAIN output for old method:
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref
 |
 rows | Extra |
 ++-+---+----+-**--**
 +--+-++--+**
 --**---+
 | 1 | SIMPLE | sc | index | 

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

 Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
 date ranges and new data is constantly coming in, so I am not sure how I 
 could archive/cache the necessary data that would be any more efficient than 
 simply using the database directly.
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 Thinking outside the query, is there any archiving that could happen to make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
 On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
 On 9/1/2011 09:42, Brandon Phelps wrote:
 
 On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 ...
 
 WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 00:00:00')
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 12:36:53')
 
 In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 Now add an index over open_dt and close_dt and see what happens.
 
 Jochem
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull 1
 days worth of connections, I would miss that entry. Basically I want to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start and
 end dates, I need to see that record.
 
 Any other ideas?
 
 
 I believe Jochem was on the right track but he got his dates reversed.
 
 Let's try a little ASCII art to show the situation. I will setup a query
 window with two markers (s) and (e). Events will be marked by || 
 markers
 showing their durations.
 
 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|
 
 To describe these situations:
 a) is the window for which you want to query (s) is the starting time and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be part
 of your results.
 c) the event starts before the window but ends within the window -
 include this
 d) the event starts and ends within the window - include this
 e) the event starts before the window and ends after the window - include
 this
 f) the event starts inside the window but ends beyond the window -
 include this.
 g) the event starts and ends beyond the window - exclude this.
 
 In order to get every event in the range of c-f, here is what you need
 for a WHERE clause
 
 WHERE start= (ending time) and end= (starting time)
 
 Try that and let us know the results.
 
 
 Thanks Jochem and Shawn, however the following two queries result in the
 exact same EXPLAIN output: (I hope the tables don't wrap too early for you)
 
 Old method:
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 (open_dt= '2011-08-31 09:53:31' OR close_dt= '2011-08-31 09:53:31')
 AND (open_dt= '2011-09-01 09:53:31' OR close_dt= '2011-09-01
 09:53:31')
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 New method with BTREE index on open_dt, close_dt (index name is
 ndx_open_close_dt):
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
 LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
 LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
 open_dt= '2011-09-01 09:53:31' AND close_dt= '2011-08-31 09:53:31'
 ORDER BY rcvd DESC
 LIMIT 0, 10;
 
 EXPLAIN output for old method:
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | id | select_type | table | type | possible_keys | key | key_len | ref |
 rows | Extra |
 ++-+---+--**--+---**
 +--+-+**+--+--**---+
 | 1 | SIMPLE | sc | index | open_dt,ndx_open_close_dt | ndx_rcvd | 4 |
 NULL | 10 | Using where |
 | 1 | SIMPLE | spm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.src_port |
 1 | |
 | 1 | SIMPLE | dpm | eq_ref | PRIMARY | PRIMARY | 2 | syslog.sc.dst_port |
 1 | |
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is in your machine?


On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting
varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.



On 09/08/2011 02:16 PM, Andrew Moore wrote:


Thinking outside the query, is there any archiving that could happen to
make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:


  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


  On 9/1/2011 09:42, Brandon Phelps wrote:


  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:



...





  WHERE

(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30


00:00:00')



AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30



12:36:53')


  In that case your logic here simplifies to:

WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



  Now add an index over open_dt and close_dt and see what happens.




  Jochem




Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull
1
days worth of connections, I would miss that entry. Basically I want
to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start
and
end dates, I need to see that record.

Any other ideas?


  I believe Jochem was on the right track but he got his dates

reversed.

Let's try a little ASCII art to show the situation. I will setup a
query
window with two markers (s) and (e). Events will be marked by ||
markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting 

Re: Query Optimization

2011-09-08 Thread Derek Downey
Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:

 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
 server 11.04.  Unfortunately the machine only has 2GB of RAM but no other 
 major daemons are running on the machine.  We are running RAID 1 (mirroring) 
 with 1TB drives.  The tables in question here are all MyISAM.  When running 
 with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key   
   | key_len | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using where; 
 Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY   
   | 2   | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 +-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key | key_len 
 | ref| rows | Extra   |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL
 | NULL   | 32393330 | Using where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   
 | syslog.sc.dst_port |1 | |
 ++-+---++-+-+-++--+-+
 
 Thanks for all your help thus far.
 
 On 09/08/2011 02:38 PM, Andrew Moore wrote:
 I don't think I saw any query timings in the emails (maybe I missed them).
 
 What version of MySQL are you currently using?
 What does the explain look like when your remove the limit 10?
 Is your server tuned for MyISAM or InnoDB?
 What kind of disk setup is in use?
 How much memory is in your machine?
 
 
 On Thu, Sep 8, 2011 at 7:27 PM, Brandon Phelpsbphe...@gls.com  wrote:
 
 Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.
 
 
 
 On 09/08/2011 02:16 PM, Andrew Moore wrote:
 
 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?
 
 Andy
 
 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:
 
  On 09/01/2011 01:32 PM, Brandon Phelps wrote:
 
  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:
 
  On 9/1/2011 09:42, Brandon Phelps wrote:
 
  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
 
 ...
 
 
  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30
 
 00:00:00')
 
 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30
 
 12:36:53')
 
  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'
 
 
  Now add an index over open_dt and close_dt and see what happens.
 
 
  Jochem
 
 
 Jochem,
 
 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so given
 your logic if the connection was started 2 days ago and I want to pull
 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

I have considered this but have not yet determined how best to go about 
partitioning the table.  I don't think partitioning by dst_address or 
src_address would help because most of the queries do not filter on IP address 
(except very specific queries where the end-user is searching the table for 
history on a particular employee).

I could potentially partition the table based on the day of week the connection 
was opened on which may improve performance for a while since this would take 
me from a single 32million record table down to roughly 4.5 million records per 
partition (32 / 7) however we are looking to eventually store up to 2 months 
worth of data in the table, and so far that 32 million records is only for 1 
month, so I estimate another 32 million-ish before the month is out, bringing 
me to roughly 70 million records total (it would be nice if I could store even 
more than 2 months, but given my currently performance dilemma I don't expect 
that to happen).  Also this does not take into account that the end-user will 
often be pulling data for multiple days at a time, meaning that multiple 
partitions in this scheme will need to be accessed anyway.

The only other logical partitioning scheme I can think of would be to partition 
based on dst_port (the port the log relates to) but the majority of records are 
all to port 80 (standard web traffic) so I don't think this would be all that 
helpful.

I have never messed with partitioning so it is totally possible that I am not 
thinking of something, so if you have any ideas on a decent partitioning scheme 
based on my criteria and queries below, please let me know.

Thanks,
Brandon
 


On 09/08/2011 02:47 PM, Mihail Manolov wrote:

If you're running version 5.1+ you may wanna take a look at table partitioning 
options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:


Thanks for the reply Andy.  Unfortunately the users will be selecting varying 
date ranges and new data is constantly coming in, so I am not sure how I could 
archive/cache the necessary data that would be any more efficient than simply 
using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Thinking outside the query, is there any archiving that could happen to make
your large tables kinder in the range scan?

Andy

On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com   wrote:


On 09/01/2011 01:32 PM, Brandon Phelps wrote:


On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:


On 9/1/2011 09:42, Brandon Phelps wrote:


On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

...



WHERE
(open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

00:00:00')

AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

12:36:53')


In that case your logic here simplifies to:
WHERE
open_dt= '2011-08-30 00:00:00'
AND
close_dt= '2011-08-30 12:36:53'



Now add an index over open_dt and close_dt and see what happens.



Jochem


Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query
window with two markers (s) and (e). Events will be marked by || markers
showing their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and
(e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part
of your results.
c) the event starts before the window but ends within the window -
include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include
this
f) the event starts inside the window but ends beyond the window -
include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need
for a WHERE clause

WHERE start= (ending time) and end= (starting time)

Try that and let us know the results.



Thanks Jochem and Shawn, however the following two queries result in the
exact same EXPLAIN output: (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR ('2011-09-07 
13:18:58' = close_dt = '2011-09-08 13:18:58');


++-+---++---+-+-++--+-+
| id | select_type | table | type   | possible_keys | key | key_len | ref   
 | rows | Extra   |
++-+---++---+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| NULL  
 | 32393330 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++---+-+-++--+-+

I did create indexes on open_dt and close_dt (2 separate indexes).



On 09/08/2011 02:55 PM, Derek Downey wrote:

Correct me if I'm wrong. You're wanting to get all records that have an 
open_date or a close_date between two times.

If that's correct, you might be able to get an index_merge by doing a query 
like:

WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
time)=close_dt=(ending time))

and creating two indexes (one on 'open_dt' and the other on 'close_dt')

http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html

Regards,
Derek

On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:


Andy,

The queries take minutes to run.  MySQL is 5.1.54 and it's running on Ubuntu 
server 11.04.  Unfortunately the machine only has 2GB of RAM but no other major 
daemons are running on the machine.  We are running RAID 1 (mirroring) with 1TB 
drives.  The tables in question here are all MyISAM.  When running with the 
LIMIT 10 my EXPLAIN is:

++-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key 
| key_len | ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | ndx_open_close_rcvd 
| 8   | NULL   | 32393316 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
| 2   | syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

When I remove the LIMIT 10 I get:

+-+---++-+-+-++--+-+
| id | select_type | table | type   | possible_keys   | key | key_len | 
ref| rows | Extra   |
++-+---++-+-+-++--+-+
|  1 | SIMPLE  | sc| ALL| ndx_open_close_rcvd | NULL| NULL| 
NULL   | 32393330 | Using where; Using filesort |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY | 2   | 
syslog.sc.dst_port |1 | |
++-+---++-+-+-++--+-+

Thanks for all your help thus far.

On 09/08/2011 02:38 PM, Andrew Moore wrote:

I don't think I saw any query timings in the emails (maybe I missed them).

What version of MySQL are you currently using?
What does the explain look like when your remove the limit 10?
Is your server tuned for MyISAM or InnoDB?
What kind of disk setup is in use?
How much memory is 

Re: Query Optimization

2011-09-08 Thread Andrew Moore
Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelps bphe...@gls.com wrote:

 Mihail,

 I have considered this but have not yet determined how best to go about
 partitioning the table.  I don't think partitioning by dst_address or
 src_address would help because most of the queries do not filter on IP
 address (except very specific queries where the end-user is searching the
 table for history on a particular employee).

 I could potentially partition the table based on the day of week the
 connection was opened on which may improve performance for a while since
 this would take me from a single 32million record table down to roughly 4.5
 million records per partition (32 / 7) however we are looking to eventually
 store up to 2 months worth of data in the table, and so far that 32 million
 records is only for 1 month, so I estimate another 32 million-ish before the
 month is out, bringing me to roughly 70 million records total (it would be
 nice if I could store even more than 2 months, but given my currently
 performance dilemma I don't expect that to happen).  Also this does not take
 into account that the end-user will often be pulling data for multiple days
 at a time, meaning that multiple partitions in this scheme will need to be
 accessed anyway.

 The only other logical partitioning scheme I can think of would be to
 partition based on dst_port (the port the log relates to) but the majority
 of records are all to port 80 (standard web traffic) so I don't think this
 would be all that helpful.

 I have never messed with partitioning so it is totally possible that I am
 not thinking of something, so if you have any ideas on a decent partitioning
 scheme based on my criteria and queries below, please let me know.

 Thanks,
 Brandon


 On 09/08/2011 02:47 PM, Mihail Manolov wrote:

 If you're running version 5.1+ you may wanna take a look at table
 partitioning options you may have.

 On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting
 varying date ranges and new data is constantly coming in, so I am not sure
 how I could archive/cache the necessary data that would be any more
 efficient than simply using the database directly.


 On 09/08/2011 02:16 PM, Andrew Moore wrote:

 Thinking outside the query, is there any archiving that could happen to
 make
 your large tables kinder in the range scan?

 Andy

 On Thu, Sep 8, 2011 at 7:03 PM, Brandon Phelpsbphe...@gls.com
 wrote:

  On 09/01/2011 01:32 PM, Brandon Phelps wrote:

  On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

  On 9/1/2011 09:42, Brandon Phelps wrote:

  On 09/01/2011 04:59 AM, Jochem van Dieten wrote:

 ...


  WHERE
 (open_dt= '2011-08-30 00:00:00' OR close_dt= '2011-08-30

 00:00:00')

 AND (open_dt= '2011-08-30 12:36:53' OR close_dt= '2011-08-30

 12:36:53')

  In that case your logic here simplifies to:
 WHERE
 open_dt= '2011-08-30 00:00:00'
 AND
 close_dt= '2011-08-30 12:36:53'


  Now add an index over open_dt and close_dt and see what happens.


  Jochem


 Jochem,

 I can't really use your WHERE logic because I also need to retrieve
 results where the open_dt time is out of the range specified. For
 example, a very large file download might span multiple days so
 given
 your logic if the connection was started 2 days ago and I want to
 pull 1
 days worth of connections, I would miss that entry. Basically I want
 to
 SELECT all of the records that were opened OR closed during the
 specified time period, ie. if any activity happened between my start
 and
 end dates, I need to see that record.

 Any other ideas?


  I believe Jochem was on the right track but he got his dates
 reversed.

 Let's try a little ASCII art to show the situation. I will setup a
 query
 window with two markers (s) and (e). Events will be marked by ||
 markers
 showing their durations.

 a) (s) (e)
 b) |---|
 c) |---|
 d) |---|
 e) ||
 f) |---|
 g) |---|

 To describe these situations:
 a) is the window for which you want to query (s) is the starting time
 and
 (e) is the ending time for the date range you are interested in.
 b) the event starts and stops before your window exists. It won't be

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:

 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE ('2011-09-07 13:18:58' = open_dt = '2011-09-08 13:18:58') OR 
 ('2011-09-07 13:18:58' = close_dt = '2011-09-08 13:18:58');
 
 
 ++-+---++---+-+-++--+-+
 | id | select_type | table | type   | possible_keys | key | key_len | ref 
| rows | Extra   |
 ++-+---++---+-+-++--+-+
 |  1 | SIMPLE  | sc| ALL| NULL  | NULL| NULL| 
 NULL   | 32393330 | Using where |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.src_port |1 | |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY | 2   | 
 syslog.sc.dst_port |1 | |
 ++-+---++---+-+-++--+-+
 
 I did create indexes on open_dt and close_dt (2 separate indexes).
 
 
 
 On 09/08/2011 02:55 PM, Derek Downey wrote:
 Correct me if I'm wrong. You're wanting to get all records that have an 
 open_date or a close_date between two times.
 
 If that's correct, you might be able to get an index_merge by doing a query 
 like:
 
 WHERE ((starting time)=open_dt= (ending time)) OR ((starting 
 time)=close_dt=(ending time))
 
 and creating two indexes (one on 'open_dt' and the other on 'close_dt')
 
 http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html
 
 Regards,
 Derek
 
 On Sep 8, 2011, at 2:50 PM, Brandon Phelps wrote:
 
 Andy,
 
 The queries take minutes to run.  MySQL is 5.1.54 and it's running on 
 Ubuntu server 11.04.  Unfortunately the machine only has 2GB of RAM but no 
 other major daemons are running on the machine.  We are running RAID 1 
 (mirroring) with 1TB drives.  The tables in question here are all MyISAM.  
 When running with the LIMIT 10 my EXPLAIN is:
 
 ++-+---++-+-+-++--+-+
 | id | select_type | table | type   | possible_keys   | key 
 | key_len | ref| rows | Extra   
 |
 ++-+---++-+-+-++--+-+
 |  1 | SIMPLE  | sc| range  | ndx_open_close_rcvd | 
 ndx_open_close_rcvd | 8   | NULL   | 32393316 | Using 
 where; Using filesort |
 |  1 | SIMPLE  | spm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.src_port |1 | 
 |
 |  1 | SIMPLE  | dpm   | eq_ref | PRIMARY | PRIMARY 
 | 2   | syslog.sc.dst_port |1 | 
 |
 ++-+---++-+-+-++--+-+
 
 When I remove the LIMIT 10 I get:
 
 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Andrew,

Generally there is only 1 user performing the complicated SELECT query at a time, however 
the background process that fills the table is constantly doing a fast SELECT (0.3 
seconds) and a subsequent UPDATE.  Basically whenever a connection is closed on the 
firewall, the bg process SELECTS from the table the last record that was opened (and 
hasn't already been closed) for a given source IP, source port, dest ip, and protocol.  
So for example whenever the firewall logs a CONNECTION CLOSED message, the bg 
process does:

select id from sonicwall_connections where src_address = w.x.y.z and src_port = 
x and dst_address = w.x.y.z and protocol = x ORDER BY open_dt LIMIT 0, 1

then it UPDATES the close_dt column for the record with the selected ID.  These 
select/update statements being run by the background process generally take 
0.000x seconds each.

The only slowdown we see is with the complicated SELECT I have below, and this 
is only executed once in a while throughout the day.  For example, if an IT 
manager sees a huge bandwidth spike on the network, he may access the web gui 
to determine what the source IP is so he can determine who is downloading a 
large file, etc.

I think what I will try to do is create 60 partitions for the table in question 
based on month and day.  This way each day will go into it's own partition and 
if someone runs the query for a date range such as 01/01/2011 - 01/07/2011 it 
will only need to access 7 partitions instead of the entire table.

My only question with this is how would I go about creating the table with 60 
partitions in such a way that I won't need to constantly add/remove new/old 
partitions every day?  Is there any partitioning function I can use that would 
not require me to update the partitions schema every day?  I already plan to 
have a cron run each night to purge records older than 60 days from the 
database.

On 09/08/2011 03:26 PM, Andrew Moore wrote:

Partitioning isn't a bad idea for this however I'm still thinking about your
dataset size and possible hardware limitations. It's not likely going to fit
into relevant buffers/memory so you're going to be on disk more then you
want. You're probably creating temporary tables like crazy and I would bet
that there are a good load of them heading to disk too. With your IO
performance limited to a small amount of disks as you describe, you're not
going to be able to get much more from these queries. Although a dedicated
DB server are there other databases been accessed on the server? When
looking at the scope of your data, are you capturing more then you need? How
often and how far back are the users querying? How many users concurrently
performing queries on the 32m record table?

On Thu, Sep 8, 2011 at 8:04 PM, Brandon Phelpsbphe...@gls.com  wrote:


Mihail,

I have considered this but have not yet determined how best to go about
partitioning the table.  I don't think partitioning by dst_address or
src_address would help because most of the queries do not filter on IP
address (except very specific queries where the end-user is searching the
table for history on a particular employee).

I could potentially partition the table based on the day of week the
connection was opened on which may improve performance for a while since
this would take me from a single 32million record table down to roughly 4.5
million records per partition (32 / 7) however we are looking to eventually
store up to 2 months worth of data in the table, and so far that 32 million
records is only for 1 month, so I estimate another 32 million-ish before the
month is out, bringing me to roughly 70 million records total (it would be
nice if I could store even more than 2 months, but given my currently
performance dilemma I don't expect that to happen).  Also this does not take
into account that the end-user will often be pulling data for multiple days
at a time, meaning that multiple partitions in this scheme will need to be
accessed anyway.

The only other logical partitioning scheme I can think of would be to
partition based on dst_port (the port the log relates to) but the majority
of records are all to port 80 (standard web traffic) so I don't think this
would be all that helpful.

I have never messed with partitioning so it is totally possible that I am
not thinking of something, so if you have any ideas on a decent partitioning
scheme based on my criteria and queries below, please let me know.

Thanks,
Brandon


On 09/08/2011 02:47 PM, Mihail Manolov wrote:


If you're running version 5.1+ you may wanna take a look at table
partitioning options you may have.

On Sep 8, 2011, at 2:27 PM, Brandon Phelps wrote:

  Thanks for the reply Andy.  Unfortunately the users will be selecting

varying date ranges and new data is constantly coming in, so I am not sure
how I could archive/cache the necessary data that would be any more
efficient than simply using the database directly.


On 09/08/2011 02:16 PM, Andrew Moore wrote:

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT | union1,2 | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings 

Re: Query Optimization

2011-09-08 Thread Mihail Manolov
From the manual: The default behavior for UNION is that duplicate rows are 
removed from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:

 Mihail,
 
 Thanks so much!  I modified your example to include the proper ORDER BY and 
 LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
 Question, if a record's open_dt is between the range AND the close_dt is 
 between the range as well, will the UNION output the record twice?  If so, is 
 there any way to prevent that?
 
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 UNION
 (SELECT
   sc.open_dt,
   sc.close_dt,
   sc.protocol,
   INET_NTOA(sc.src_address) AS src_address,
   sc.src_port,
   INET_NTOA(sc.dst_address) AS dst_address,
   sc.dst_port,
   sc.sent,
   sc.rcvd,
   spm.desc AS src_port_desc,
   dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
   LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
   LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
 ORDER BY rcvd DESC LIMIT 10;
 
 ++--++++--+-++--++
 | id | select_type  | table  | type   | possible_keys 
  | key  | key_len | ref| rows | Extra 
  |
 ++--++++--+-++--++
 |  1 | PRIMARY  | sc | range  | 
 open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL
| 1057 | Using where|
 |  1 | PRIMARY  | spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  1 | PRIMARY  | dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 |  2 | UNION| sc | range  | ndx_close_dt  
  | ndx_close_dt | 8   | NULL   | 1131 | Using where   
  |
 |  2 | UNION| spm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.src_port |1 |   
  |
 |  2 | UNION| dpm| eq_ref | PRIMARY   
  | PRIMARY  | 2   | syslog.sc.dst_port |1 |   
  |
 | NULL | UNION RESULT | union1,2 | ALL| NULL
| NULL | NULL| NULL   | NULL | Using 
 filesort |
 ++--++++--+-++--++
 
 
 
 On 09/08/2011 03:45 PM, Mihail Manolov wrote:
 How about:
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 UNION
 
 SELECT
  sc.open_dt,
  sc.close_dt,
  sc.protocol,
  INET_NTOA(sc.src_address) AS src_address,
  sc.src_port,
  INET_NTOA(sc.dst_address) AS dst_address,
  sc.dst_port,
  sc.sent,
  sc.rcvd,
  spm.desc AS src_port_desc,
  dpm.desc AS dst_port_desc
 FROM sonicwall_connections AS sc
  LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
  LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'
 
 
 
 On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:
 
 Thanks for the idea Derek, however given the following query my EXPLAIN 
 output is identical:
 
 SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA(sc.src_address) AS src_address,
 sc.src_port,
 INET_NTOA(sc.dst_address) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
 FROM 

Re: Query Optimization

2011-09-08 Thread Brandon Phelps

Ah I see.  Well thanks for your assistance!

-Brandon

On 09/08/2011 05:21 PM, Mihail Manolov wrote:

 From the manual: The default behavior for UNION is that duplicate rows are removed 
from the result.

On Sep 8, 2011, at 4:50 PM, Brandon Phelps wrote:


Mihail,

Thanks so much!  I modified your example to include the proper ORDER BY and 
LIMIT clauses and this, so far, is running super fast (0.0007 seconds).  
Question, if a record's open_dt is between the range AND the close_dt is 
between the range as well, will the UNION output the record twice?  If so, is 
there any way to prevent that?

(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
UNION
(SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58')
ORDER BY rcvd DESC LIMIT 10;

++--++++--+-++--++
| id | select_type  | table  | type   | possible_keys   
   | key  | key_len | ref| rows | Extra  |
++--++++--+-++--++
|  1 | PRIMARY  | sc | range  | 
open_dt,ndx_open_close_rcvd,ndx_open_close | open_dt  | 8   | NULL  
 | 1057 | Using where|
|  1 | PRIMARY  | spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  1 | PRIMARY  | dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
|  2 | UNION| sc | range  | ndx_close_dt
   | ndx_close_dt | 8   | NULL   | 1131 | Using where|
|  2 | UNION| spm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.src_port |1 ||
|  2 | UNION| dpm| eq_ref | PRIMARY 
   | PRIMARY  | 2   | syslog.sc.dst_port |1 ||
| NULL | UNION RESULT |union1,2  | ALL| NULL  
 | NULL | NULL| NULL   | NULL | Using filesort |
++--++++--+-++--++



On 09/08/2011 03:45 PM, Mihail Manolov wrote:

How about:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE open_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'

UNION

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM sonicwall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE close_dt BETWEEN '2011-09-07 13:18:58' AND '2011-09-08 13:18:58'



On Sep 8, 2011, at 3:25 PM, Brandon Phelps wrote:


Thanks for the idea Derek, however given the following query my EXPLAIN output 
is identical:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS 

Re: Query Optimization

2011-09-01 Thread Jochem van Dieten
On Aug 30, 2011 6:46 PM, Brandon Phelps wrote:
 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc

If this is a firewall connection log I presume open_dt is the time a
connection was opened and is always going to be less than close_dt. Right?

 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

In that case your logic here simplifies to:
WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

Now add an index over open_dt and close_dt and see what happens.

Jochem


Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  SELECT
 sc.open_dt,
 sc.close_dt,
 sc.protocol,
 INET_NTOA( sc.src_address ) AS src_address,
 sc.src_port,
 INET_NTOA( sc.dst_address ) AS dst_address,
 sc.dst_port,
 sc.sent,
 sc.rcvd,
 spm.desc AS src_port_desc,
 dpm.desc AS dst_port_desc
  FROM firewall_connections AS sc

 If this is a firewall connection log I presume open_dt is the time a
 connection was opened and is always going to be less than close_dt.
 Right?

  WHERE
 (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
 AND (open_dt = '2011-08-30 12:36:53' OR close_dt = 
'2011-08-30 12:36:53')


 In that case your logic here simplifies to:
 WHERE
   open_dt = '2011-08-30 00:00:00'
   AND
   close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve 
results where the open_dt time is out of the range specified.  For 
example, a very large file download might span multiple days so given 
your logic if the connection was started 2 days ago and I want to pull 1 
days worth of connections, I would miss that entry.  Basically I want to 
SELECT all of the records that were opened OR closed during the 
specified time period, ie. if any activity happened between my start and 
end dates, I need to see that record.


Any other ideas?

Thanks again,
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-01 Thread Shawn Green (MySQL)

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
   ...

   WHERE
   (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
   AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

  In that case your logic here simplifies to:
  WHERE
  open_dt = '2011-08-30 00:00:00'
  AND
  close_dt = '2011-08-30 12:36:53'

  Now add an index over open_dt and close_dt and see what happens.

  Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query 
window with two markers (s) and (e).  Events will be marked by || 
markers showing their durations.


a)   (s)   (e)
b) |---|
c)  |---|
d)|---|
e)  ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time 
and (e) is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be 
part of your results.
c) the event starts before the window but ends within the window - 
include this

d) the event starts and ends within the window  - include this
e) the event starts before the window and ends after the window - 
include this
f) the event starts inside the window but ends beyond the window - 
include this.

g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need 
for a WHERE clause


WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.
--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-09-01 Thread Brandon Phelps

On 09/01/2011 12:47 PM, Shawn Green (MySQL) wrote:

On 9/1/2011 09:42, Brandon Phelps wrote:

On 09/01/2011 04:59 AM, Jochem van Dieten wrote:
  ...

  WHERE
  (open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
00:00:00')
  AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
12:36:53')

 In that case your logic here simplifies to:
 WHERE
 open_dt = '2011-08-30 00:00:00'
 AND
 close_dt = '2011-08-30 12:36:53'

 Now add an index over open_dt and close_dt and see what happens.

 Jochem

Jochem,

I can't really use your WHERE logic because I also need to retrieve
results where the open_dt time is out of the range specified. For
example, a very large file download might span multiple days so given
your logic if the connection was started 2 days ago and I want to pull 1
days worth of connections, I would miss that entry. Basically I want to
SELECT all of the records that were opened OR closed during the
specified time period, ie. if any activity happened between my start and
end dates, I need to see that record.

Any other ideas?



I believe Jochem was on the right track but he got his dates reversed.

Let's try a little ASCII art to show the situation. I will setup a query window 
with two markers (s) and (e). Events will be marked by || markers showing 
their durations.

a) (s) (e)
b) |---|
c) |---|
d) |---|
e) ||
f) |---|
g) |---|

To describe these situations:
a) is the window for which you want to query (s) is the starting time and (e) 
is the ending time for the date range you are interested in.
b) the event starts and stops before your window exists. It won't be part of 
your results.
c) the event starts before the window but ends within the window - include this
d) the event starts and ends within the window - include this
e) the event starts before the window and ends after the window - include this
f) the event starts inside the window but ends beyond the window - include this.
g) the event starts and ends beyond the window - exclude this.

In order to get every event in the range of c-f, here is what you need for a 
WHERE clause

WHERE start = (ending time) and end = (starting time)

Try that and let us know the results.


Thanks Jochem and Shawn, however the following two queries result in the exact 
same EXPLAIN output:   (I hope the tables don't wrap too early for you)

Old method:
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-31 09:53:31' OR close_dt = '2011-08-31 09:53:31')
AND (open_dt = '2011-09-01 09:53:31' OR close_dt = '2011-09-01 
09:53:31')
ORDER BY rcvd DESC
LIMIT 0, 10;

New method with BTREE index on open_dt, close_dt (index name is 
ndx_open_close_dt):
SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA(sc.src_address) AS src_address,
sc.src_port,
INET_NTOA(sc.dst_address) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
open_dt = '2011-09-01 09:53:31' AND close_dt = '2011-08-31 09:53:31'
ORDER BY rcvd DESC
LIMIT 0, 10;

EXPLAIN output for old method:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |
++-+---++---+--+-++--+-+
|  1 | SIMPLE  | sc| index  | open_dt,ndx_open_close_dt | ndx_rcvd | 4  
 | NULL   |   10 | Using where |
|  1 | SIMPLE  | spm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.src_port |1 | |
|  1 | SIMPLE  | dpm   | eq_ref | PRIMARY   | PRIMARY  | 2  
 | syslog.sc.dst_port |1 | |
++-+---++---+--+-++--+-+

EXPLAIN output for new method with new index:
++-+---++---+--+-++--+-+
| id | select_type | table | type   | possible_keys | key  | 
key_len | ref| rows | Extra   |

Query Optimization

2011-08-30 Thread Brandon Phelps

Hello,

I have the following query I'd like to optimize a bit:

SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30 
12:36:53')
ORDER BY rcvd DESC
LIMIT 0 , 10

Currently I have an index on the rcvd column which seems to be working 
based on the output of EXPLAIN:


id  select_type table   typepossible_keys   key key_len 
ref rowsExtra
1   SIMPLE  sc  index   open_dt ndx_rcvd4   NULL
10  Using where
1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.src_port  1   
1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2   
syslog.sc.dst_port  1   

However the query is still fairly slow for some reason, any ideas how I 
could speed it up with additional indexes, etc?


The values I am using in the WHERE clause are variable and change each time.

The table has around 23 million records right now but is expected to 
continue to grow up to a potential 150 million.


Here is the table schema:
CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Thanks in advance!

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-30 Thread Michael Dykman
There are a few things gonig on, but mainly it is the ORs that are killing
you.

As your  require OR to examine two distinct columns, both of equal relevance
to the query, MySQL: is left with no choice but to do a full table scan on
what might be (at a guess) a very larger table.  No amount of indexing will
fix this for the query presented.

You would be better off writing it as two distinct queires, each concerned
with conditions on a single column (open_dt and close_dt) and then UNIONing
the results. In this form, the indexes have a chance of being engaged.

Once the indexes are engaged, you probably want to your DATE/DATETIME
strings into actual DATEs or DATATIMEs, thus:

 ...
   (open_dt = DATE('2011-08-30 00:00:00'))
 ...

In it's current state, the DATE fields are being converted to strings
implicitly for every row tested which further frustrates index usage as the
index is against the quasi-numeric DATE, not the string representation which
your current implementation appears to expect.  This query would also
suddenly begin to fail entirely if the DEFAULT_DATE_FORMAT gets modified by
an admin or a future release of MySQL.  The explicit casting I have
suggested will protect you against that,

 - michael dykman`
On Tue, Aug 30, 2011 at 12:45 PM, Brandon Phelps bphe...@gls.com wrote:

 Hello,

 I have the following query I'd like to optimize a bit:

 SELECT
sc.open_dt,
sc.close_dt,
sc.protocol,
INET_NTOA( sc.src_address ) AS src_address,
sc.src_port,
INET_NTOA( sc.dst_address ) AS dst_address,
sc.dst_port,
sc.sent,
sc.rcvd,
spm.desc AS src_port_desc,
dpm.desc AS dst_port_desc
 FROM firewall_connections AS sc
LEFT JOIN port_mappings AS spm ON spm.port = sc.src_port
LEFT JOIN port_mappings AS dpm ON dpm.port = sc.dst_port
 WHERE
(open_dt = '2011-08-30 00:00:00' OR close_dt = '2011-08-30
 00:00:00')
AND (open_dt = '2011-08-30 12:36:53' OR close_dt = '2011-08-30
 12:36:53')
 ORDER BY rcvd DESC
 LIMIT 0 , 10

 Currently I have an index on the rcvd column which seems to be working
 based on the output of EXPLAIN:

 id  select_type table   typepossible_keys   key key_len
 ref rowsExtra
 1   SIMPLE  sc  index   open_dt ndx_rcvd4
 NULL10  Using where
 1   SIMPLE  spm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.src_port  1
 1   SIMPLE  dpm eq_ref  PRIMARY PRIMARY 2
 syslog.sc.dst_port  1

 However the query is still fairly slow for some reason, any ideas how I
 could speed it up with additional indexes, etc?

 The values I am using in the WHERE clause are variable and change each
 time.

 The table has around 23 million records right now but is expected to
 continue to grow up to a potential 150 million.

 Here is the table schema:
 CREATE TABLE IF NOT EXISTS `firewall_connections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `open_dt` datetime NOT NULL,
  `close_dt` datetime NOT NULL,
  `protocol` smallint(6) NOT NULL,
  `src_address` int(10) unsigned NOT NULL,
  `src_port` smallint(5) unsigned NOT NULL,
  `dst_address` int(10) unsigned NOT NULL,
  `dst_port` smallint(5) unsigned NOT NULL,
  `sent` int(10) unsigned NOT NULL,
  `rcvd` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ndx_rcvd` (`rcvd`),
  KEY `ndx_sent` (`sent`)
 ) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

 Thanks in advance!

 --
 Brandon

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com




-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.


Query Optimization

2011-08-10 Thread Brandon Phelps

Hello all,

I am using the query below and variations of it to query a database with 
a TON of records.  Currently the database has around 11 million records 
but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below query, 
as currently it takes this query around 10 seconds to run but I am sure 
this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the output 
of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-10 Thread Brandon Phelps
Thanks Singer,  this took my query down to 0.0007, perfect!  I wasn't 
aware a single index of multiple columns would work when one of the 
columns was in the WHERE clause and the other in the ORDER BY clause. 
Learn something new every day I guess!



On 08/10/2011 02:03 PM, Singer X.J. Wang wrote:

Try a index on (dst_port,close_dt)

On Wed, Aug 10, 2011 at 14:01, Brandon Phelps bphe...@gls.com
mailto:bphe...@gls.com wrote:

Hello all,

I am using the query below and variations of it to query a database
with a TON of records.  Currently the database has around 11 million
records but it grows every day and should cap out at around 150 million.

I am curious if there is any way I can better optimize the below
query, as currently it takes this query around 10 seconds to run but
I am sure this will get slower and slower as the database grows.

SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the
output of EXPLAIN:

id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Thanks in advance,

--
Brandon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?__unsub=w...@singerwang.com
http://lists.mysql.com/mysql?unsub=w...@singerwang.com


--
The best compliment you could give Pythian for our service is a referral.



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-10 Thread Peter Brawley

On 8/10/2011 1:01 PM, Brandon Phelps wrote:

Hello all,

I am using the query below and variations of it to query a database 
with a TON of records.  Currently the database has around 11 million 
records but it grows every day and should cap out at around 150 million.


I am curious if there is any way I can better optimize the below 
query, as currently it takes this query around 10 seconds to run but I 
am sure this will get slower and slower as the database grows.


SELECT
open_dt,
close_dt,
protocol,
INET_NTOA(src_address) AS src_address,
src_port,
INET_NTOA(dst_address) AS dst_address,
dst_port,
sent,
rcvd
FROM connections
WHERE
dst_port = 80
ORDER BY close_dt  DESC
LIMIT 0, 30

I do have an index on the dst_port column, as you can see by the 
output of EXPLAIN:


id   1
select_type  SIMPLE
tableconnections
type ref
possible_keysdst_port
key  dst_port
key_len  2
ref  const
rows 1109273
ExtraUsing where; Using filesort

Did you try adding your ORDER BY argument close_dt to the index?

PB

-


Thanks in advance,



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2011-08-10 Thread Jan Steinman
 From: Brandon Phelps bphe...@gls.com
 
 I am curious if there is any way I can better optimize the below query, as 
 currently it takes this query around 10 seconds to run but I am sure this 
 will get slower and slower as the database grows.

You need an index on `close_dt`.

 SELECT
   open_dt,
   close_dt,
   protocol,
   INET_NTOA(src_address) AS src_address,
   src_port,
   INET_NTOA(dst_address) AS dst_address,
   dst_port,
   sent,
   rcvd
 FROM connections
 WHERE
   dst_port = 80
 ORDER BY close_dt  DESC
 LIMIT 0, 30


Current farmers, who have become mere operators of machines and mixers of 
chemicals, may not have the skills to develop a local, sustainable agriculture. 
A new generation of farmers, numbering in the tens of millions, will need to be 
trained and relocated to rural communities. -- Pat Murphy
 Jan Steinman, EcoReality Co-op 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Need help with query optimization

2010-03-17 Thread John Daisley
It may only be returning 51 rows but its having to read significantly more.

Get rid of the derived table join if possible. Something like

SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI, TAP.LastName,
TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District,Sum(Pts.Points) Total_Points
FROM TorchAwardParticipants TAP
 JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
 JOIN Chapters C On C.ID http://c.id/=M.ChapterID
 JOIN Schools S On S.ID http://s.id/=C.SchoolID
 JOIN TorchAwardSelAct  Pts ON Pts.AchievementID=TAP.ID http://tap.id/
WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL
AND Pts.LocalApproveStatus='A'
GROUP BY TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
TAP.LastName, TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District
ORDER BY TAP.LastName, TAP.FirstName, Sum(Pts.Points)
Regards
John



On Tue, Mar 16, 2010 at 6:17 PM, Jesse j...@msdlg.com wrote:

 I have the following query:

 SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
 TAP.LastName, TAP.State,
 TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
 S.Region, S.District,Pts.TotPoints
 FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
  JOIN Chapters C On C.ID http://c.id/=M.ChapterID
  JOIN Schools S On S.ID http://s.id/=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/
 WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT
 NULL
 ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

 The TorchAwardParticipants table has about 84,000 records in it.
 The query takes almost 40 seconds to return the data, which is only 51
 rows.
 An EXPLAIN returns the following:

 ++-+++-+---+-+-++-+
 | id | select_type | table  | type   | possible_keys   | key |
 key_len | ref | rows   | Extra |

 ++-+++-+---+-+-++-+
 |  1 | PRIMARY | derived2 | ALL| NULL| NULL |
 NULL| NULL|   4382 | Using temporary; Using filesort
 |
 |  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID |
 PRIMARY | 4   | Pts.AchievementID   |  1 | Using where |
 |  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3|
 PRIMARY | 4   | bpa.TAP.CurrentMemberID |  1 | |
 |  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   |
 PRIMARY | 4   | bpa.M.ChapterID |  1 | |
 |  1 | PRIMARY | S  | eq_ref | PRIMARY |
 PRIMARY | 4   | bpa.C.SchoolID  |  1 | |
 |  2 | DERIVED | TASA   | index  | NULL|
 AchievementID | 5   | NULL| 161685 | Using where |

 ++-+++-+---+-+-++-+

 What is the best way to optimize this query so that it doesn't take 40
 seconds to return the dataset?

 Jesse



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?unsub=john.dais...@butterflysystems.co.uk




Need help with query optimization

2010-03-16 Thread Jesse

I have the following query:

SELECT TAP.ID, M.UID, TAP.FirstName, TAP.MI, TAP.LastName, TAP.State,
TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
S.Region, S.District,Pts.TotPoints
FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID=TAP.CurrentMemberID
  JOIN Chapters C On C.ID=M.ChapterID
  JOIN Schools S On S.ID=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.ID
WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT NULL
ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

The TorchAwardParticipants table has about 84,000 records in it.
The query takes almost 40 seconds to return the data, which is only 51 rows.
An EXPLAIN returns the following:
++-+++-+---+-+-++-+
| id | select_type | table  | type   | possible_keys   | key 
| key_len | ref | rows   | Extra 
|

++-+++-+---+-+-++-+
|  1 | PRIMARY | derived2 | ALL| NULL| NULL 
| NULL| NULL|   4382 | Using temporary; Using 
filesort |
|  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID | PRIMARY 
| 4   | Pts.AchievementID   |  1 | Using where 
|
|  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3| PRIMARY 
| 4   | bpa.TAP.CurrentMemberID |  1 | 
|
|  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   | PRIMARY 
| 4   | bpa.M.ChapterID |  1 | 
|
|  1 | PRIMARY | S  | eq_ref | PRIMARY | PRIMARY 
| 4   | bpa.C.SchoolID  |  1 | 
|
|  2 | DERIVED | TASA   | index  | NULL| 
AchievementID | 5   | NULL| 161685 | Using where 
|

++-+++-+---+-+-++-+

What is the best way to optimize this query so that it doesn't take 40 
seconds to return the dataset?


Jesse



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Need help with query optimization

2010-03-16 Thread Ananda Kumar
Can you please show us the indexes on both the tables.

regards
anandkl

On Tue, Mar 16, 2010 at 11:47 PM, Jesse j...@msdlg.com wrote:

 I have the following query:

 SELECT TAP.ID http://tap.id/, M.UID, TAP.FirstName, TAP.MI,
 TAP.LastName, TAP.State,
 TAP.SchoolName, TAP.StateApproved, TAP.DiplomatApproved, C.ChapterType,
 S.Region, S.District,Pts.TotPoints
 FROM TorchAwardParticipants TAP
  JOIN Members M On M.ID http://m.id/=TAP.CurrentMemberID
  JOIN Chapters C On C.ID http://c.id/=M.ChapterID
  JOIN Schools S On S.ID http://s.id/=C.SchoolID
  JOIN (SELECT AchievementID,Sum(Points) As TotPoints
 FROM TorchAwardSelAct TASA
 WHERE LocalApproveStatus='A'
 GROUP BY AchievementID) Pts ON Pts.AchievementID=TAP.IDhttp://tap.id/
 WHERE TAP.State='OH' AND TAP.Sub='1' AND TAP.SubmittedToDiplomat IS NOT
 NULL
 ORDER BY TAP.LastName, TAP.FirstName, Pts.TotPoints

 The TorchAwardParticipants table has about 84,000 records in it.
 The query takes almost 40 seconds to return the data, which is only 51
 rows.
 An EXPLAIN returns the following:

 ++-+++-+---+-+-++-+
 | id | select_type | table  | type   | possible_keys   | key |
 key_len | ref | rows   | Extra |

 ++-+++-+---+-+-++-+
 |  1 | PRIMARY | derived2 | ALL| NULL| NULL |
 NULL| NULL|   4382 | Using temporary; Using filesort
 |
 |  1 | PRIMARY | TAP| eq_ref | PRIMARY,CurrentMemberID |
 PRIMARY | 4   | Pts.AchievementID   |  1 | Using where |
 |  1 | PRIMARY | M  | eq_ref | PRIMARY,IX_Members_3|
 PRIMARY | 4   | bpa.TAP.CurrentMemberID |  1 | |
 |  1 | PRIMARY | C  | eq_ref | PRIMARY,IX_Chapters_1   |
 PRIMARY | 4   | bpa.M.ChapterID |  1 | |
 |  1 | PRIMARY | S  | eq_ref | PRIMARY |
 PRIMARY | 4   | bpa.C.SchoolID  |  1 | |
 |  2 | DERIVED | TASA   | index  | NULL|
 AchievementID | 5   | NULL| 161685 | Using where |

 ++-+++-+---+-+-++-+

 What is the best way to optimize this query so that it doesn't take 40
 seconds to return the dataset?

 Jesse



 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql?unsub=anan...@gmail.com




VFP to MySQL Query Optimization

2009-10-20 Thread Matt Neimeyer
I feel like there should be a better way to do this... So I'm hoping
someone will be able to advise.

We have contacts that belong to organizations. We also have a document
tracker. Holding over from VFP you can have up to six organization ids
on the document and up to six contact ids. Right now the select to see
if a contact has documents looks like this...

SELECT * FROM Contacts WHERE
   (id IN (SELECT contid1 FROM documents)
   OR id IN (SELECT contid2 FROM documents)
   OR id IN (SELECT contid3 FROM documents)
   OR id IN (SELECT contid4 FROM documents)
   OR id IN (SELECT contid5 FROM documents)
   OR id IN (SELECT contid6 FROM documents)
   OR orgid IN (SELECT orgid1 FROM documents)
   OR orgid IN (SELECT orgid2 FROM documents)
   OR orgid IN (SELECT orgid3 FROM documents)
   OR orgid IN (SELECT orgid4 FROM documents)
   OR orgid IN (SELECT orgid5 FROM documents)
   OR orgid IN (SELECT orgid6 FROM documents))

Which is UGLY... and I feel like there should be a better way (I know
I could break that out into a many-many relationship via a third
linking table but I'm not 'able' to do that now.)

The only change I can think of is to union the two halves of the
select but I'm not sure if that would be better... (IE id in (select
contid1 from documents union select contid2 from documents) etc)

Any advice or is this the best I'm going to get until I can reorganize
the underlying structure?

Thanks!

Matt

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



RE: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Gavin Towey
Hi Ciaran,

So I think there's a couple things going on:

1. The explain plan for your slow query looks wrong, such as mysql is 
confused.  It's possible your index statistics are incorrect.  Try ANALYZE 
TABLE  on listings and addresses.

I think a sure way to fix it is to add STRAIGHT_JOIN to force the join order.  
That should get rid of the temp table and filesort operations and give faster 
results.

SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



2. I need to make some comments about your table design:

This column is AWFUL:
`addressable_type` varchar(255) DEFAULT NULL,

Why have field that hold up to 255 characters and put a little string in it 
like Listing? Why does it matter?  Well it makes your indexes disasterously 
bloated:

KEY `index_addresses_on_parent_city_id_and_addressable_type`
(`parent_city_id`,`addressable_type`),


If you noticed in the explain plan, that index is 733 bytes *per row*.  
Especially using utf8 means each character takes 3 bytes in the index.  That's 
terrible. That type field should be a foreign key tinyint or at the very least 
be a much much shorter varchar field (such as 8 or 10)

You have lots of those varchar(255) fields, which looks like lazy design -- by 
not gathering correct requirements and designing accordingly you will hurt your 
database performance, waste disk space and cause yourself all kinds of future 
problems.

3.  Why are you using OUTER JOIN?

It looks to me like you're using it because you don't know the difference, 
since you're not looking for NULL rows or anything.  In fact, it looks like 
mysql is smart enough to know that you've negated the OUTER JOIN by putting 
conditions on the joined tables in the WHERE clause, and convert then to INNER 
JOINS.  Don't rely on that!  Use the correct join type.

Those queries
Regards,
Gavin Towey

-Original Message-
From: Ciaran Lee [mailto:ciaran@gmail.com]
Sent: Tuesday, September 22, 2009 1:32 PM
To: mysql@lists.mysql.com
Subject: query optimization question (my struggle against 'using temporary; 
using filesort')

Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using

Re: query optimization question (my struggle against 'using temporary; using filesort')

2009-09-24 Thread Ciaran Lee
Hi Gavin,Thanks very much, I'll implement as many of your suggestions as
possible. The varchar(255)'s are inexcusable and I feel suitably ashamed :)
The queries were generated by ActiveRecord (an ORM library for Ruby),
although even if I had written them myself they would probably not be much
better.
Regards,
Ciaran Lee

2009/9/24 Gavin Towey gto...@ffn.com

 Hi Ciaran,

 So I think there's a couple things going on:

 1. The explain plan for your slow query looks wrong, such as mysql is
 confused.  It's possible your index statistics are incorrect.  Try ANALYZE
 TABLE  on listings and addresses.

 I think a sure way to fix it is to add STRAIGHT_JOIN to force the join
 order.  That should get rid of the temp table and filesort operations and
 give faster results.

 SELECT
  STRAIGHT_JOIN
  listings.*, addresses.*
  FROM
`listings`
 JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1



 2. I need to make some comments about your table design:

 This column is AWFUL:
 `addressable_type` varchar(255) DEFAULT NULL,

 Why have field that hold up to 255 characters and put a little string in it
 like Listing? Why does it matter?  Well it makes your indexes
 disasterously bloated:

 KEY `index_addresses_on_parent_city_id_and_addressable_type`
 (`parent_city_id`,`addressable_type`),


 If you noticed in the explain plan, that index is 733 bytes *per row*.
  Especially using utf8 means each character takes 3 bytes in the index.
  That's terrible. That type field should be a foreign key tinyint or at the
 very least be a much much shorter varchar field (such as 8 or 10)

 You have lots of those varchar(255) fields, which looks like lazy design --
 by not gathering correct requirements and designing accordingly you will
 hurt your database performance, waste disk space and cause yourself all
 kinds of future problems.

 3.  Why are you using OUTER JOIN?

 It looks to me like you're using it because you don't know the difference,
 since you're not looking for NULL rows or anything.  In fact, it looks like
 mysql is smart enough to know that you've negated the OUTER JOIN by putting
 conditions on the joined tables in the WHERE clause, and convert then to
 INNER JOINS.  Don't rely on that!  Use the correct join type.

 Those queries
 Regards,
 Gavin Towey

 -Original Message-
 From: Ciaran Lee [mailto:ciaran@gmail.com]
 Sent: Tuesday, September 22, 2009 1:32 PM
 To: mysql@lists.mysql.com
 Subject: query optimization question (my struggle against 'using temporary;
 using filesort')

 Hi,

 I hope this is the right place to ask a question about query optimization.

 Background:
 I have a database which has events, which occur in places (listings).
 Places
 have addresses, and addresses belong to a city. I can select the latest
 event within a particular city very efficiently (less than 2ms), but
 selecting the latest listing within a city is REALLY slow (10-20 seconds)
 despite being almost a subset of the event query.

 I have been working on this for about a day, and have tried all sorts of
 tweaks to the indexes but to no avail. I always seem to end up with 'using
 temporary; using filesort' as the 'extra' content in the explain result. If
 anyone has a suggestion for what I might do to fix this, I'd really
 appreciate it. If not, I could further de-normalize the database for
 performance reasons, but I would feel dirty for doing so.

 Here is the fast query (select the latest event within a particular city),
 and it's explain.
 SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1


  
 ++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |

  
 ++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref

query optimization question (my struggle against 'using temporary; using filesort')

2009-09-22 Thread Ciaran Lee
Hi,

I hope this is the right place to ask a question about query optimization.

Background:
I have a database which has events, which occur in places (listings). Places
have addresses, and addresses belong to a city. I can select the latest
event within a particular city very efficiently (less than 2ms), but
selecting the latest listing within a city is REALLY slow (10-20 seconds)
despite being almost a subset of the event query.

I have been working on this for about a day, and have tried all sorts of
tweaks to the indexes but to no avail. I always seem to end up with 'using
temporary; using filesort' as the 'extra' content in the explain result. If
anyone has a suggestion for what I might do to fix this, I'd really
appreciate it. If not, I could further de-normalize the database for
performance reasons, but I would feel dirty for doing so.

Here is the fast query (select the latest event within a particular city),
and it's explain.
SELECT
  events.*, listings.*, addresses.*
  FROM
`events`
LEFT OUTER JOIN
  `listings` ON `listings`.id = `events`.listing_id
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY events.id DESC LIMIT 1

  
++-+---++++-+--+--+-+
  | id | select_type | table | type   | possible_keys

   | key
   | key_len | ref  | rows | Extra   |
  
++-+---++++-+--+--+-+
  |  1 | SIMPLE  | events| index  | index_events_on_listing_id

  | PRIMARY
   | 4   | NULL |1 | |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY
   | 4   | ratemyarea.events.listing_id |1 | Using where |
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| index_addresses_on_addressable_type_and_addressable_id | 773 | const,
ratemyarea.listings.id |1 | Using where |
  
++-+---++++-+--+--+-+


Here is the slow query (select the latest listing within a particular city),
and it's explain
SELECT
  listings.*, addresses.*
  FROM
`listings`
LEFT OUTER JOIN
  `addresses` ON `addresses`.addressable_id = `listings`.id
  AND
  `addresses`.addressable_type = 'Listing'
  WHERE (addresses.parent_city_id = 3)
  ORDER BY listings.id DESC LIMIT 1

  
++-+---+++--+-+-+---+--+
  | id | select_type | table | type   | possible_keys

   | key  | key_len | ref
  | rows  | Extra
 |
  
++-+---+++--+-+-+---+--+
  |  1 | SIMPLE  | addresses | ref|
index_addresses_on_parent_city_id_and_addressable_type,index_addresses_on_addressable_type_and_addressable_id,addressable_id_type_city
| addressable_id_type_city | 773 | const,const |
25680 | Using where; Using temporary; Using filesort |
  |  1 | SIMPLE  | listings  | eq_ref | PRIMARY

   | PRIMARY  | 4   |
ratemyarea.addresses.addressable_id | 1 |
   |
  
++-+---+++--+-+-+---+--+


Here

Re: Hi Query Optimization steps

2009-08-26 Thread Joerg Bruehe
Hi!


bharani kumar wrote:
 Then which DB engine is better.
 

Which fruit is better, an apple or a banana?

And if X is better, why do farmers still grow Y, and why do people buy it?


More serious:
Different engines offer different features, as described in the manual.
The most obvious differences are full-text search (MyISAM) versus
transactions and referential integrity (InnoDB), but there are more.
During your database schema design you should for each table select the
engine which is appropriate.


Jörg

-- 
Joerg Bruehe,  MySQL Build Team,  joerg.bru...@sun.com
Sun Microsystems GmbH,   Komturstraße 18a,   D-12099 Berlin
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering Muenchen: HRB161028


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



RE: Hi Query Optimization steps

2009-08-26 Thread Jerry Schwartz


-Original Message-
From: joerg.bru...@sun.com [mailto:joerg.bru...@sun.com]
Sent: Wednesday, August 26, 2009 5:08 AM
To: mysql@lists.mysql.com
Cc: bharani kumar
Subject: Re: Hi Query Optimization steps

Hi!


bharani kumar wrote:
 Then which DB engine is better.


Which fruit is better, an apple or a banana?

And if X is better, why do farmers still grow Y, and why do people buy it?

[JS] Good answer. I did performance consulting for over 25 years, on *NIX
and other operating systems, and there was one response that suited all
questions: It depends.

Regards,
 
Jerry Schwartz
The Infoshop by Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
 
860.674.8796 / FAX: 860.674.8341
 
www.the-infoshop.com







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Hi Query Optimization steps

2009-08-25 Thread bharani kumar
Then which DB engine is better.

Thanks

On Mon, Aug 24, 2009 at 8:16 PM, mos mo...@fastmail.fm wrote:

 At 05:01 AM 8/22/2009, you wrote:

 Hi all

 Tell Me few Query Optimization  Tips 

 For example.. some people saying that , avoid using more of and condition
 in
 the query


 Huh? You need to carefully plan your table structure and use indexes
 sparingly on the columns you are using in the Where clauses. An EXPLAIN in
 front of the Select statement will show you which indexes it is using.


  This sort of very basic things please tell me ...

 Also tell me , in the optimization view ,, which database engine is
 best


 If you need transactions or RI then you have to use InnoDb. Otherwise
 MyISAM is quite fast. Again it depends on the type of queries you are
 executing.


  Thanks

 Bharanikumar


 There is no magic wand to getting MySQL to run faster except planning and
 knowledge. It will take some effort on your part but it will be fun.

 Take a look at High Performance MySQL 2nd Edition. Some of the first
 edition is online at
 http://books.google.com/books?id=sgMvu2uZXlsCprintsec=frontcoverdq=high+performance+mysql#v=onepageq=f=false

 You can get the second edition at Amazon. Also you may need to read other
 MySQL books like MySQL Cookbook

 http://books.google.com/books?id=aC3blgT3D3YCprintsec=frontcoverdq=mysql+cookbook#v=onepageq=f=false
 The 2nd edition is at Amazon.

 or MySQL by Paul DuBois is also quite good. These will get you started on
 the right foot.

 Mike


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?unsub=bharanikumariyer...@gmail.com




-- 
Regards
B.S.Bharanikumar
http://php-mysql-jquery.blogspot.com/


Re: Hi Query Optimization steps

2009-08-24 Thread mos

At 05:01 AM 8/22/2009, you wrote:

Hi all

Tell Me few Query Optimization  Tips 

For example.. some people saying that , avoid using more of and condition in
the query


Huh? You need to carefully plan your table structure and use indexes 
sparingly on the columns you are using in the Where clauses. An EXPLAIN in 
front of the Select statement will show you which indexes it is using.




This sort of very basic things please tell me ...

Also tell me , in the optimization view ,, which database engine is best


If you need transactions or RI then you have to use InnoDb. Otherwise 
MyISAM is quite fast. Again it depends on the type of queries you are 
executing.




Thanks

Bharanikumar


There is no magic wand to getting MySQL to run faster except planning and 
knowledge. It will take some effort on your part but it will be fun.


Take a look at High Performance MySQL 2nd Edition. Some of the first 
edition is online at 
http://books.google.com/books?id=sgMvu2uZXlsCprintsec=frontcoverdq=high+performance+mysql#v=onepageq=f=false


You can get the second edition at Amazon. Also you may need to read other 
MySQL books like MySQL Cookbook

http://books.google.com/books?id=aC3blgT3D3YCprintsec=frontcoverdq=mysql+cookbook#v=onepageq=f=false
The 2nd edition is at Amazon.

or MySQL by Paul DuBois is also quite good. These will get you started on 
the right foot.


Mike


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Hi Query Optimization steps

2009-08-22 Thread bharani kumar
Hi all

Tell Me few Query Optimization  Tips 

For example.. some people saying that , avoid using more of and condition in
the query

This sort of very basic things please tell me ...

Also tell me , in the optimization view ,, which database engine is best

Thanks

Bharanikumar


Re: Query Optimization

2009-01-14 Thread Johnny Withers
The index hint is not in productoin code.. I was trying ot force it to use
the index even when using the OR clause.. ment to take that out before I
sent the email.

The table structure is:

CREATE TABLE `customer` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `ssn` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_ssn` (`ssn`)
) ENGINE=InnoDB

CREATE TABLE `customer_id` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` int(10) unsigned DEFAULT NULL,
  `id_num` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `customer_key` (`customer_id`),
  KEY `id_id_num` (`id_num`)
) ENGINE=InnoDB
The explain output of the query using the OR clause:
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ALL
possible_keys: idx_ssn
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 176680
Extra: Using where; Using temporary
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where; Distinct
2 rows in set (0.00 sec)

Using a UNION results in:

*** 1. row ***
   id: 1
  select_type: PRIMARY
table: customer
 type: range
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 1
Extra: Using where; Using temporary
*** 2. row ***
   id: 1
  select_type: PRIMARY
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using index; Distinct
*** 3. row ***
   id: 2
  select_type: UNION
table: customer_id
 type: range
possible_keys: customer_key,id_id_num
  key: id_id_num
  key_len: 35
  ref: NULL
 rows: 1
Extra: Using where; Using temporary
*** 4. row ***
   id: 2
  select_type: UNION
table: customer
 type: eq_ref
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: aca_ecash.customer_id.customer_id
 rows: 1
Extra: Using where
*** 5. row ***
   id: NULL
  select_type: UNION RESULT
table: union1,2
 type: ALL
possible_keys: NULL
  key: NULL
  key_len: NULL
  ref: NULL
 rows: NULL
Extra:
5 rows in set (0.01 sec)

The union is much faster. I've tested the same search for ID numbers on our
test system (Windows 32-Bit, 2GB ram, P4 3Ghz) against the productoin system
(RHEL 64-Bit 16GB ram, Dual Xeon 2Ghz).. the search in the test system is
almost instant as compared to the production system its taking 4 to 6
seconds. There's not much traffic today on it.

I'm going to put the UNION into production and see how it goes.

Thanks for the replies.

-johnny
On Tue, Jan 13, 2009 at 7:39 PM, Andrew Garner andrew.b.gar...@gmail.comwrote:

  On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote:
  If you have separate indexes on ssn and id_num, MySQL may be able to
  efficiently use an index merge optimization .   See
  http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
  This is only in 5.0+ - on older versions of MySQL you may find a union
  more efficient.
 
  And in newer versions, too.  The optimizer frequently underestimates
  the cost of the merge operation and the required random I/O for row
  lookups.  So, yes it can use an index merge, but... efficiency is
  another question.  I've seen table scans outperform a two-way index
  merge by orders of magnitude.
 
 These appeared to be high selectivity indexes, but perhaps I assumed
 too much. :)




-- 
-
Johnny Withers
601.209.4985
joh...@pixelated.net


Re: Query Optimization

2009-01-13 Thread Ken Menzel

Try a union instead of an or condition.
http://dev.mysql.com/doc/refman/5.0/en/union.html

Johnny Withers wrote:

I have the following tables:

Customer: id,ssn
Customer_Id: id,customer_id,id_num

The customer table holds customers along with their SSN and the customer_id
table holds identifications for each customer (Driver's License, State
Issued ID, Student ID, etc). The SSN column from the customer table is
VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32).
Both of these columns have an index on them.

The following query uses the index on customer.ssn and executes in 0ms:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ref
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: const
 rows: 1
Extra: Using where; Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra:

Now, this is the query I have trouble with, it does not use the index (or
says it does but doesn't?) and on a busy system (200+ queries per sec) can
take up to 20 seconds or more to execute:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789' OR id_num='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: index
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 165843
Extra: Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where


Is there some way I can make it use the index? I've thought about
redesigning the query to select from the customer_id table first, if a row
is found.. just return the matching customer_id from the customer table..
but I wanted to see if maybe i'm going about this the wrong way before I
engineer some way around this.

Thanks in advance,

-
Johnny Withers
601.209.4985
joh...@pixelated.net




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2009-01-13 Thread Andrew Garner
Do you have an index on id_num?  What sort of explain output do you
get when you don't use a query hint? Your USE INDEX hint may be
causing MySQL to ignore a better strategy.

If you have separate indexes on ssn and id_num, MySQL may be able to
efficiently use an index merge optimization .   See
http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
This is only in 5.0+ - on older versions of MySQL you may find a union
more efficient.


On Mon, Jan 12, 2009 at 9:43 AM, Johnny Withers joh...@pixelated.net wrote:
 I have the following tables:

 Customer: id,ssn
 Customer_Id: id,customer_id,id_num

 The customer table holds customers along with their SSN and the customer_id
 table holds identifications for each customer (Driver's License, State
 Issued ID, Student ID, etc). The SSN column from the customer table is
 VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32).
 Both of these columns have an index on them.

 The following query uses the index on customer.ssn and executes in 0ms:

 SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
 FROM customer USE INDEX(idx_ssn)
 LEFT JOIN customer_id ON customer.id=customer_id.customer_id
 WHERE ssn='123456789';

 Explain output:

 *** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ref
 possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: const
 rows: 1
Extra: Using where; Using index
 *** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
 possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra:

 Now, this is the query I have trouble with, it does not use the index (or
 says it does but doesn't?) and on a busy system (200+ queries per sec) can
 take up to 20 seconds or more to execute:

 SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
 FROM customer USE INDEX(idx_ssn)
 LEFT JOIN customer_id ON customer.id=customer_id.customer_id
 WHERE ssn='123456789' OR id_num='123456789';

 Explain output:

 *** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: index
 possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 165843
Extra: Using index
 *** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
 possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where


 Is there some way I can make it use the index? I've thought about
 redesigning the query to select from the customer_id table first, if a row
 is found.. just return the matching customer_id from the customer table..
 but I wanted to see if maybe i'm going about this the wrong way before I
 engineer some way around this.

 Thanks in advance,

 -
 Johnny Withers
 601.209.4985
 joh...@pixelated.net


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2009-01-13 Thread Baron Schwartz
 If you have separate indexes on ssn and id_num, MySQL may be able to
 efficiently use an index merge optimization .   See
 http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
 This is only in 5.0+ - on older versions of MySQL you may find a union
 more efficient.

And in newer versions, too.  The optimizer frequently underestimates
the cost of the merge operation and the required random I/O for row
lookups.  So, yes it can use an index merge, but... efficiency is
another question.  I've seen table scans outperform a two-way index
merge by orders of magnitude.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Re: Query Optimization

2009-01-13 Thread Andrew Garner
On Tue, Jan 13, 2009 at 7:07 PM, Baron Schwartz ba...@xaprb.com wrote:
 If you have separate indexes on ssn and id_num, MySQL may be able to
 efficiently use an index merge optimization .   See
 http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html.
 This is only in 5.0+ - on older versions of MySQL you may find a union
 more efficient.

 And in newer versions, too.  The optimizer frequently underestimates
 the cost of the merge operation and the required random I/O for row
 lookups.  So, yes it can use an index merge, but... efficiency is
 another question.  I've seen table scans outperform a two-way index
 merge by orders of magnitude.

These appeared to be high selectivity indexes, but perhaps I assumed
too much. :)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org



Query Optimization

2009-01-12 Thread Johnny Withers
I have the following tables:

Customer: id,ssn
Customer_Id: id,customer_id,id_num

The customer table holds customers along with their SSN and the customer_id
table holds identifications for each customer (Driver's License, State
Issued ID, Student ID, etc). The SSN column from the customer table is
VARCHAR(9) and the id_num column from the customer_id table is VARCHAR(32).
Both of these columns have an index on them.

The following query uses the index on customer.ssn and executes in 0ms:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: ref
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: const
 rows: 1
Extra: Using where; Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra:

Now, this is the query I have trouble with, it does not use the index (or
says it does but doesn't?) and on a busy system (200+ queries per sec) can
take up to 20 seconds or more to execute:

SELECT SQL_NO_CACHE customer.id,customer.ssn,customer_id,id_num
FROM customer USE INDEX(idx_ssn)
LEFT JOIN customer_id ON customer.id=customer_id.customer_id
WHERE ssn='123456789' OR id_num='123456789';

Explain output:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: customer
 type: index
possible_keys: idx_ssn
  key: idx_ssn
  key_len: 35
  ref: NULL
 rows: 165843
Extra: Using index
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: customer_id
 type: ref
possible_keys: customer_key
  key: customer_key
  key_len: 5
  ref: aca_ecash.customer.id
 rows: 1
Extra: Using where


Is there some way I can make it use the index? I've thought about
redesigning the query to select from the customer_id table first, if a row
is found.. just return the matching customer_id from the customer table..
but I wanted to see if maybe i'm going about this the wrong way before I
engineer some way around this.

Thanks in advance,

-
Johnny Withers
601.209.4985
joh...@pixelated.net


Re: Query optimization help

2008-08-12 Thread Brent Baisley
First, you might want to move the WHERE...t3.int_a = some integer  
condition into the join condition for t3.
Your not using anything from t4, so I'm not sure why you have that  
table in your query.


You can suggest or force mysql to use an index if it's using the wrong  
one:

http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

On very rare occasions I've had to do this. What's happening is that  
mysql is analyzing the information it has about the data and indexes  
and coming to the wrong conclusion, perhaps even opting for an entire  
table scan. You can run ANALYZE TABLE to force mysql to update the  
information it has about the data. This may actually solve your problem.


Try SHOW INDEX FROM t1 to see what data mysql has about the indexes.  
Sometimes the CARDINALITY (uniqueness) column will be null which can  
indicate a problem.


Posting the result of your EXPLAIN will actually be helpful.

Hope that helps.

Brent Baisley


On Aug 11, 2008, at 8:26 AM, Jonathan Terhorst wrote:


I have this query:

SELECT DISTINCT t1.string FROM t1
LEFT JOIN t2 ON t1.string=t2.string
LEFT JOIN t3 ON t1.int_a=t3.int_a
LEFT JOIN t4 ON t1.int_b=t4.int_b
WHERE
t1.string != '' AND
t2.string IS NULL AND
t3.int_a = some integer
ORDER BY
t1.string ASC

This query is executing slower than it should. EXPLAIN has it using  
temporary and using filesort.
I have indexes on every column in the query, but I think the problem  
is the one-index-per-table limitation. According to EXPLAIN, there  
are two possible indices in use for t1 (int_a and string), but only  
int_a is being used. So I tried constructing a compound index on  
int_a and string. Although this new index appears in possible_keys,  
EXPLAIN still shows the key actually being used as int_a. I tried  
building the compound key in both orders and had the same results.  
How do get mysql to all possible keys on t1 when running the query?  
Thanks!




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Query optimization help

2008-08-11 Thread Jonathan Terhorst

I have this query:

SELECT DISTINCT t1.string FROM t1
LEFT JOIN t2 ON t1.string=t2.string
LEFT JOIN t3 ON t1.int_a=t3.int_a
LEFT JOIN t4 ON t1.int_b=t4.int_b
WHERE
t1.string != '' AND
t2.string IS NULL AND
t3.int_a = some integer
ORDER BY
t1.string ASC

This query is executing slower than it should. EXPLAIN has it using  
temporary and using filesort.
I have indexes on every column in the query, but I think the problem  
is the one-index-per-table limitation. According to EXPLAIN, there are  
two possible indices in use for t1 (int_a and string), but only int_a  
is being used. So I tried constructing a compound index on int_a and  
string. Although this new index appears in possible_keys, EXPLAIN  
still shows the key actually being used as int_a. I tried building the  
compound key in both orders and had the same results. How do get mysql  
to all possible keys on t1 when running the query? Thanks!




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query optimization

2008-01-22 Thread Joris Kinable
-What I'am trying to do:
Bit hard to explain. I've got a table consisting of ip addresses
(ipv4_src), destination addresses (ipv4_dst), and port numbers
(port_dst) and some other irrelevant columns. Ultimately my goal is to
find a linear order in a subset of ports. For example, host A connects
to B on port 1, 2, 3, 4,...20, I would like to find this linear
relation. To achieve this, I have written some mathematical formula's.
Unfortunately there is one downside to my formula: It can be fooled by
adding some extreme values. It cannot find a linear relation in this
list: 1, 2, 3, 4,...20, 45000. Although there are 20 numbers lined up,
the 45000 number ruïns the result of the algorithm. So the query I've
submitted, is ment to remove extreme values. The methods
boxplot(pF.port_dst,LOW) and boxplot(pF.port_dst,HIGH) calculate
in linear time the allowed range of numbers. Extreme values won't be
included in this range. So in the example, the range would be [1,20]
therby omitting the value 45000. Finally I would like to filter my
table with port numbers for every ipv4_srcipv4_dst tuple and
remove all the numbers not fitting in the range.

-In human readable pseudo code this is the query:

SELECT source,dest,port,octets FROM
(
SELECT source,dest,boxplot(port,LOW) AS low,boxplot(port,HIGH) AS
high FROM --Calculate the LOW and HIGH values for each source,dest
pair.
(
SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries.
) pF GROUP BY source,dest
) boxplot
(
SELECT source,dest,port,octets FROM... GROUP BY source,dest,port
--This removes the duplicate entries (again!).
) filter
WHERE filter.source=boxplot.source AND filter.dest=boxplot.dest AND
filter.port=boxplot.LOW AND filter.port=boxplot.HIGH --Relate the
tables 'boxplot' and 'filter' to eachother AND select only the
source,dest,port tuples where port is in the range [LOW,HIGH] from
the filter table.


-Here is the original query I would like to optimize again:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst
) boxplot,
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM ... GROUP BY
ipv4_src, ipv4_dst, port_dst
)filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
filter.port_dst=boxplot.low

I'll try the 'EXPLAIN' query tonight, but therefor I've got to create
a small database first to speed up the results.

Thnx in advance for your help.



On Jan 22, 2008 8:15 AM, Sebastian Mendel [EMAIL PROTECTED] wrote:
 Joris Kinable schrieb:

  Optimize query
 
  I've got one query, which I would like to improve a lot since it takes
  very long (24 hours) to execute. Here is the idea:
  1. Take the table ipv4_srcipv4_dstport_dst (other rows in this
  table are not mentioned for clearity) and remove all duplicate
  tuple's. This is done by subquery 'filter'.
  2. The same query is performed by the boxplot query, but this time an
  aditional group by command is executed, therby calculating a User
  Defined Function boxplot(row,type) which returns a double value.
  3. Finally the results of the query in step 2 are used to select a
  subset of results from the 'filter' table.
  4. As you can see, the subquery 'pF' used in step 2 is identical to
  the query 'filter'. It's an extreme waste to calculate the same table
  twice. I've tried to create a temporary table from filter, but
  unfortunately Mysql doesn't allow you to access a temporary table
  twice in the same query. I prefer a 1 query answer, instead of
  creating views, or temporary tables.
 
  Is there a way to improve this query, therby improving the execution time?
 
  Query:
 
  SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
  (
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
  low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
  prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
  ipv4_src,ipv4_dst,port_dst ASC
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING 
  COUNT(filter.port_dst)10
  ) boxplot,
  (
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
  GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
  ipv4_src,ipv4_dst,port_dst ASC
  ) filter
  WHERE filter.ipv4_src=boxplot.ipv4_src AND
  filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
  filter.port_dst=boxplot.low

 what you are trying to do?

 and how about formating your query in a human readable way?

 did you tried EXPLAIN?

 what type of syntax is this: boxplot(pF.port_dst,LOW) ?


 --
 Sebastian


--

Query optimization

2008-01-21 Thread Joris Kinable
Optimize query

I've got one query, which I would like to improve a lot since it takes
very long (24 hours) to execute. Here is the idea:
1. Take the table ipv4_srcipv4_dstport_dst (other rows in this
table are not mentioned for clearity) and remove all duplicate
tuple's. This is done by subquery 'filter'.
2. The same query is performed by the boxplot query, but this time an
aditional group by command is executed, therby calculating a User
Defined Function boxplot(row,type) which returns a double value.
3. Finally the results of the query in step 2 are used to select a
subset of results from the 'filter' table.
4. As you can see, the subquery 'pF' used in step 2 is identical to
the query 'filter'. It's an extreme waste to calculate the same table
twice. I've tried to create a temporary table from filter, but
unfortunately Mysql doesn't allow you to access a temporary table
twice in the same query. I prefer a 1 query answer, instead of
creating views, or temporary tables.

Is there a way to improve this query, therby improving the execution time?

Query:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10
) boxplot,
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
filter.port_dst=boxplot.low

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query optimization

2008-01-21 Thread Sebastian Mendel

Joris Kinable schrieb:

Optimize query

I've got one query, which I would like to improve a lot since it takes
very long (24 hours) to execute. Here is the idea:
1. Take the table ipv4_srcipv4_dstport_dst (other rows in this
table are not mentioned for clearity) and remove all duplicate
tuple's. This is done by subquery 'filter'.
2. The same query is performed by the boxplot query, but this time an
aditional group by command is executed, therby calculating a User
Defined Function boxplot(row,type) which returns a double value.
3. Finally the results of the query in step 2 are used to select a
subset of results from the 'filter' table.
4. As you can see, the subquery 'pF' used in step 2 is identical to
the query 'filter'. It's an extreme waste to calculate the same table
twice. I've tried to create a temporary table from filter, but
unfortunately Mysql doesn't allow you to access a temporary table
twice in the same query. I prefer a 1 query answer, instead of
creating views, or temporary tables.

Is there a way to improve this query, therby improving the execution time?

Query:

SELECT filter.ipv4_src, filter.ipv4_dst, filter.port_dst, filter.octets FROM
(
SELECT pF.ipv4_src, pF.ipv4_dst, boxplot(pF.port_dst,LOW) AS
low,boxplot(pF.port_dst,HIGH) AS high FROM
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE
prot=6 GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) pF GROUP BY pF.ipv4_src, pF.ipv4_dst HAVING COUNT(filter.port_dst)10
) boxplot,
(
SELECT ipv4_src, ipv4_dst, port_dst, octets FROM mytable WHERE prot=6
GROUP BY ipv4_src, ipv4_dst, port_dst ORDER BY
ipv4_src,ipv4_dst,port_dst ASC
) filter
WHERE filter.ipv4_src=boxplot.ipv4_src AND
filter.ipv4_dst=boxplot.ipv4_dst AND filter.port_dst=boxplot.low AND
filter.port_dst=boxplot.low


what you are trying to do?

and how about formating your query in a human readable way?

did you tried EXPLAIN?

what type of syntax is this: boxplot(pF.port_dst,LOW) ?


--
Sebastian

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-09 Thread Sebastian Mendel
Perrin Harkins schrieb:
 On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote:
 The resultset is paginated on the front end using the
 SQL_CALC_FOUND_ROWS functionality...
 
 Usually a bad idea:
 http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/

usually ... but i do not know of any index capable of having FULLTEXT and
'normal' fields in one index

 WHERE MATCH table_2.field AGAINST ('value')
 AND table_2.current = 1
 AND table_2.shared_id IN (SELECT shared_id FROM table_1_view) 

or am i wrong?

-- 
Sebastian

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-09 Thread Perrin Harkins
On Jan 9, 2008 4:33 AM, Sebastian Mendel [EMAIL PROTECTED] wrote:
 usually ... but i do not know of any index capable of having FULLTEXT and
 'normal' fields in one index

Does that matter?  It would have to be doing a full scan for
SQL_CALC_FOUND_ROWS to work out well.

- Perrin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-09 Thread Sebastian Mendel
Perrin Harkins schrieb:
 On Jan 9, 2008 4:33 AM, Sebastian Mendel [EMAIL PROTECTED] wrote:
 usually ... but i do not know of any index capable of having FULLTEXT and
 'normal' fields in one index
 
 Does that matter?

yes, as written in the mentioned article the test is only relevant with
correct used indexes, but MySQL does not use more than one index, so this
query cannot all be done with indexes


please correct me if i am wrong

-- 
Sebastian



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-09 Thread Perrin Harkins
On Jan 9, 2008 8:34 AM, Sebastian Mendel [EMAIL PROTECTED] wrote:
 yes, as written in the mentioned article the test is only relevant with
 correct used indexes, but MySQL does not use more than one index, so this
 query cannot all be done with indexes

Well, first of all, MySQL 5 does use more than one index, although I'm
not sure it can do this with a full text index.
http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html

Also, there's a lot of room between a full table scan and using every
index.  Using any index in a way that allows the query to be satisfied
without scanning every single row should be enough to make the count
query better.

- Perrin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help with query optimization

2008-01-09 Thread Brent Baisley
Your biggest problem is probably the subquery/IN your are performing.  
You should change that to a join. And I don't know about using  
SQL_CALC_FOUND_ROWS in a full text query that's not boolean, and you  
shouldn't use it unless you have a LIMIT clause.


SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.*
FROM table_2
JOIN
(SELECT shared_id FROM table_1_view) as table_3 ON  
table_2.shared_id=table_3.shared_id

LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id
WHERE MATCH table_2.field AGAINST ('value')
AND table_2.current = 1

I know the difference doesn't seem that much, but MySQL optimizes it  
very differently.


Brent


On Jan 4, 2008, at 5:47 PM, Eben wrote:


Hi,

I have a query that has to run on a full text indexed table with  
many millions of records.  I'm trying to figure out some  
optimizations for it.  Here's the general query:


SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.*
FROM table_2
LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id
WHERE MATCH table_2.field AGAINST ('value')
AND table_2.current = 1
AND table_2.shared_id IN (SELECT shared_id FROM table_1_view)

Some details:
1. table_1.shared_id and table_2.shared_id are indexed
2. table_2.current is an indexed tinyint
3. table_1_view is derived from a query like:

SELECT shared_id
FROM table_1
WHERE some_field LIKE 'some_value%'

table_1 is a relatively small table i.e.  100k records
table_2 is massive with  10 million records

Any ideas or suggestions are appreciated

thanks,
Eben



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql? 
[EMAIL PROTECTED]





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



help wit query optimization (cont'd)

2008-01-04 Thread Eben

I left something out, the query looks like:
SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.*
FROM table_2
LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id
WHERE MATCH table_2.field AGAINST ('value')
AND table_2.current = 1
AND table_2.shared_id IN (SELECT shared_id FROM table_1_view)
LIMIT 0,10

The resultset is paginated on the front end using the 
SQL_CALC_FOUND_ROWS functionality...


thanks,
Eben



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-04 Thread Perrin Harkins
On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote:
 The resultset is paginated on the front end using the
 SQL_CALC_FOUND_ROWS functionality...

Usually a bad idea:
http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/

- Perrin

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



help with query optimization

2008-01-04 Thread Eben

Hi,

I have a query that has to run on a full text indexed table with many 
millions of records.  I'm trying to figure out some optimizations for 
it.  Here's the general query:


SELECT SQL_CALC_FOUND_ROWS table_1.id,table_2.*
FROM table_2
LEFT JOIN table_1 ON table_2.shared_id = table_1.shared_id
WHERE MATCH table_2.field AGAINST ('value')
AND table_2.current = 1
AND table_2.shared_id IN (SELECT shared_id FROM table_1_view)

Some details:
1. table_1.shared_id and table_2.shared_id are indexed
2. table_2.current is an indexed tinyint
3. table_1_view is derived from a query like:

SELECT shared_id
FROM table_1
WHERE some_field LIKE 'some_value%'

table_1 is a relatively small table i.e.  100k records
table_2 is massive with  10 million records

Any ideas or suggestions are appreciated

thanks,
Eben



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: help wit query optimization (cont'd)

2008-01-04 Thread Moon's Father
I've never used this feature before on my application.

On Jan 5, 2008 7:06 AM, Perrin Harkins [EMAIL PROTECTED] wrote:

 On Jan 4, 2008 5:51 PM, Eben [EMAIL PROTECTED] wrote:
  The resultset is paginated on the front end using the
  SQL_CALC_FOUND_ROWS functionality...

 Usually a bad idea:

 http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/

 - Perrin

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




-- 
I'm a mysql DBA in china.
More about me just visit here:
http://yueliangdao0608.cublog.cn


Query Optimization

2007-11-19 Thread dpgirago
I need to check a date_time value in a number of tables for 
up-to-date-ness. 

The following query is part of a larger PHP script that runs as a cron job 
every 10 minutes:

query
select 
case 
# If within 2 minutes, do nothing.
when (unix_timestamp() - unix_timestamp(date_time))  120 
then 'ok' 
 
# If between 2 and 60 minutes old, send an email each time 
the script is called (q 10 min).
when (unix_timestamp() - unix_timestamp(date_time)) = 120 
  
(unix_timestamp() - unix_timestamp(date_time))  3600 
then 'email'

# If over an hour old, send out one email per hour. 
when (unix_timestamp() - unix_timestamp(date_time)) = 
3600  
(unix_timestamp() - unix_timestamp(date_time)) % 3600  
2999  
(unix_timestamp() - unix_timestamp(date_time)) % 3600  
3600 
then 'email'

end 
as 'test'

from  mytable order by date_time desc limit 1;

/qyery

This seems to run OK, but I'd be interested if anyone sees any way to 
improve it.

Thanks,

--David.

Query Optimization

2007-03-12 Thread Drew Kutcharian
Hi All,

How can I optimize the following 3-way JOIN SQL query given then following 
schema:
 
 SELECT 
 ORDER.ID, ORDER.STATUS, ORDER.ORDER_TIME, ORDER_ITEM.ID, ORDER_ITEM.QUANTITY, 
PRODUCT.SIZE, PRODUCT.SHAPE, PRODUCT.PAPER, PRODUCT.TURNAROUND
 FROM
 ORDER, ORDER_ITEM, PRODUCT
 WHERE 
 ORDER.ID = ORDER_ITEM.ORDER_ID AND
 ORDER_ITEM.PRODUCT_ID = PRODUCT.PRODUCT_ID AND
 
 ORDER.STATUS = status AND
 ORDER.TIME  'startTime'  AND
 ORDER.TIME  'endTime'  AND
 
 ORDER_ITEM.QUANTITY = quantity AND
 
 PRODUCT.SIZE = 'size' AND
 PRODUCT.SHAPE = 'shape' AND
 PRODUCT.PAPER = 'paper' AND
 PRODUCT.TURNAROUND = 'turnaround'
 
 ORDER BY ORDER.ORDER_TIME DESC
 LIMIT start, offset;
 
 
 [ORDER TABLE]
 row count = 350,544 (read/write. mostly read)
 ++-+--+-+-+
 | Field  | Type| Null | Key | Default |
 ++-+--+-+-+
 | ID | int(10) unsigned| NO   | PRI | 0   |
 | STATUS | tinyint(3) unsigned | NO   | MUL | 0   |
 | ORDER_TIME | datetime| YES  | | NULL|
 
 
 [ORDER_ITEM TABLE]
 row count = 548,456 (read/write. mostly read)
 ++-+--+-+-+
 | Field  | Type| Null | Key | Default |
 ++-+--+-+-+
 | ID | int(10) unsigned| NO   | PRI | 0   |
 | ORDER_ID   | int(10) unsigned| NO   | PRI | 0   |
 | PRODUCT_ID | int(10) unsigned| YES  | | 0   |
 | QUANTITY   | int(10) unsigned| YES  | | NULL|
 
 
 [PRODUCT TABLE]
 row count = 56,641 (static content, hardly ever changes)
 +---+--+--+-+-+
 | Field | Type | Null | Key | Default |
 +---+--+--+-+-+
 | PRODUCT_ID| int(10) unsigned | NO   | PRI | 0   |
 | SIZE  | varchar(50)  | YES  | | NULL| 
(indexed)
 | PAPER | varchar(50)  | YES  | | NULL| 
(indexed)
 | TURNAROUND| varchar(50)  | YES  | | NULL| 
(indexed)
 | SHAPE | varchar(50)  | YES  | | NULL| 
(indexed)

Thanks,
Drew



Insert ... Select Max() .. transactional Query optimization on an InnoDB table

2006-08-10 Thread Asif Lodhi

Hi,

Using MySQL-5.0.22/Win-XP-SP2 and storing data in InnoDB tables.
Clients connect from VB6.  Enclosing all transactions in  START
TRANSACTION ... COMMIT statements from VB6 clients.

I have the following query for a table tmp2 with a column x of data-type INT.

Insert into tmp2(x,y,x)
Select ((Max(x))-((max(x) div 100) * 100)+1 as pid, 'text1','text2'
from tmp2
where ((x div 100)=2147
having (((max(x))-((max(x) div 100) * 100))  483647;

I have the following questions:

1) Will the above INSERT statement run if I use constant values for y
and z? without a
   GROUP BY clause?
2) How I can make the above select statement return a 0 (ZERO) value
for pid  in case it
   returns an empty resultset?  Though I am also thinking about it
but it'd be great if you guys
   could give me some hints.

I want to keep it fast so I haven't used any functions, etc.

--
Thanks in advance,

Asif

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Optimization Question

2006-03-14 Thread Michael Stassen

[EMAIL PROTECTED] wrote:
Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:


This query should be slow due to the table scan it takes to test the 
condition:


SELECT ...
FROM ...
WHERE indexfield  16


This query will use the index if the number of rows with indexfield != 16 is 
small enough ( about 30%).


This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):


(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)


At the very least, you'll want to make that UNION ALL, rather than just UNION, 
so mysql doesn't waste time looking for duplicates to remove.  But even then, 
this may be slower.


First, these two unioned queries may still not use the index.  If the number of 
rows with indexfield  16 is too large (more than about 30%) the first query 
will be a table scan.  If the number of rows with indexfield  16 is too large 
(more than about 30%) the second query will be a table scan.  In fact, if the 
number of rows with indexfield = 16 is less than about 40% of the table, then at 
least one of the two unioned queries is guaranteed to be a table scan.  Worse 
yet, this query stands a good chance of being twice as long as the single, != 
query, because it may require 2 table scans!


Second, even if both unioned queries use the index, the result still may not be 
faster.  If the combined number of matching rows is too high, the full table 
scan should be faster than the indexed lookups.


For example:

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | cat  | COUNT(*) |
  +--+--+
  |0 | 5743 |
  |1 | 3792 |
  |2 |30727 |
  |3 | 1926 |
  |4 | 7812 |
  +--+--+

19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side.

First, the != case:

  EXPLAIN SELECT * FROM inits WHERE cat != 2 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: ALL
  possible_keys: cat_idx
key: NULL
key_len: NULL
ref: NULL
   rows: 5
  Extra: Using where

As expected, a full table scan.

  SELECT * FROM inits WHERE cat != 2;
  ...
  19273 rows in set (0.37 sec)

Now, the unioned range queries:

  EXPLAIN SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2 \G
  *** 1. row ***
 id: 1
select_type: PRIMARY
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 5680
  Extra: Using where
  *** 2. row ***
 id: 2
select_type: UNION
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 6543
  Extra: Using where
  *** 3. row ***
 id: NULL
select_type: UNION RESULT
  table: union1,2
   type: ALL
  possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
   rows: NULL
  Extra:


As hoped, mysql plans to use the index for each query.

  SELECT * FROM inits WHERE cat  2
  UNION ALL
  SELECT * FROM inits WHERE cat  2;
  ...
  19273 rows in set (0.78 sec)

Despite (because of) using the index, this takes more than twice as long!

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.


Even when using an index is possible, it is not always desirable.  I'd suggest 
not trying to outfox the optimizer until you've first determined it is making 
bad choices, and then test to make sure the solution is actually an improvement.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Optimization Question

2006-03-13 Thread SGreen
Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43 PM:

 In a previous database engine I was using an IN was more optimal than a
 . So, for example:
 
 SELECT * FROM table WHERE table.type IN (1,2,3);
 
 Where the possible values of type are 0-3, was appreciably faster than:
 
 SELECT * FROM table WHERE table.type  0;
 
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
 
 TIA,
 
 R.
 
 

YES, YES, YES! This is definitely an optimization. 

When you say IN or =, you are asking for matching values. Matches can 
come from indexes. When you say  or NOT IN, you are asking for 
everything BUT matches.  In order to evaluate a negative, the database 
engine (and this is usually true regardless of database server) almost 
always performs a full table scan to test every row to make sure it is 
either  or NOT IN. At the very best, they have to perform a full index 
scan which is still less efficient than  ranged or values-based lookups.

It's when you get into the situation where you are matching against dozens 
of IN-clause items that you may run into slowdowns again. Until you reach 
2 or 3 dozen terms (depending on your hardware) you should be faster with 
an IN comparison than a  or a NOT IN comparison. An optimization to 
search for BUT a term or two is to create a temporary table of all of your 
terms and delete the exact ones you want to exclude. Put an index on yoru 
temp table then JOIN that back into your query again (replacing the huge 
IN clause).  The database will match index to index and things will get 
fast again. This technique can scale up to some really big queries.

Always try to code for the affirmative tests. Your users will thank you.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco
Shawn,
 
Any performance gains for specifying type  0 than type  0 ?
 
R.



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 6:37 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: Re: Query Optimization Question




Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
PM:

 In a previous database engine I was using an IN was more optimal than
a
 . So, for example:
  
 SELECT * FROM table WHERE table.type IN (1,2,3);
  
 Where the possible values of type are 0-3, was appreciably faster
than:
  
 SELECT * FROM table WHERE table.type  0;
  
 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.
  
 TIA,
  
 R.
  
 

YES, YES, YES! This is definitely an optimization.   

When you say IN or =, you are asking for matching values. Matches can
come from indexes. When you say  or NOT IN, you are asking for
everything BUT matches.  In order to evaluate a negative, the database
engine (and this is usually true regardless of database server) almost
always performs a full table scan to test every row to make sure it is
either  or NOT IN. At the very best, they have to perform a full index
scan which is still less efficient than  ranged or values-based lookups.


It's when you get into the situation where you are matching against
dozens of IN-clause items that you may run into slowdowns again. Until
you reach  2 or 3 dozen terms (depending on your hardware) you should be
faster with an IN comparison than a  or a NOT IN comparison. An
optimization to search for BUT a term or two is to create a temporary
table of all of your terms and delete the exact ones you want to
exclude. Put an index on yoru temp table then JOIN that back into your
query again (replacing the huge IN clause).  The database will match
index to index and things will get fast again. This technique can scale
up to some really big queries. 

Always try to code for the affirmative tests. Your users will thank you.


Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Query Optimization Question

2006-03-13 Thread SGreen
Yes, a ranged query should respond faster than a negation. In some cases 
you can seriously improve query performance for a negation query if you 
split it into two range queries unioned together. Here is a pseudo 
example:

This query should be slow due to the table scan it takes to test the 
condition:

SELECT ...
FROM ...
WHERE indexfield  16

This query should be noticeably faster due to using ranged queries 
(partial index ranged matches):

(   SELECT ...
FROM ...
WHERE indexfield  16
) UNION (
SELECT ...
FROM ...
WHERE indexfield  16
)

Of course, if you have to do a table scan ANYWAY (because a value you have 
in a constraint is not in an index) then this won't help. This 
optimization is all about getting the engine to use an index whenever it 
can instead of performing a table scan. Of course, that is not always 
possible.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29 AM:

 Shawn,
 
 Any performance gains for specifying type  0 than type  0 ?
 
 R.
 
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question
 
 
 
 
 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:
 
  In a previous database engine I was using an IN was more optimal than
 a
  . So, for example:
  
  SELECT * FROM table WHERE table.type IN (1,2,3);
  
  Where the possible values of type are 0-3, was appreciably faster
 than:
  
  SELECT * FROM table WHERE table.type  0;
  
  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.
  
  TIA,
  
  R.
  
  
 
 YES, YES, YES! This is definitely an optimization. 
 
 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.
 
 
 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can scale
 up to some really big queries. 
 
 Always try to code for the affirmative tests. Your users will thank you.
 
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine
 


RE: Query Optimization Question

2006-03-13 Thread Robert DiFalco
Interesting, that seems like an optimization the query optimizer could
do itself when it sees a  operator on a indexed numeric.



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 13, 2006 8:01 AM
To: Robert DiFalco
Cc: mysql@lists.mysql.com
Subject: RE: Query Optimization Question



Yes, a ranged query should respond faster than a negation. In some cases
you can seriously improve query performance for a negation query if you
split it into two range queries unioned together. Here is a pseudo
example: 

This query should be slow due to the table scan it takes to test the
condition: 

SELECT ... 
FROM ... 
WHERE indexfield  16 

This query should be noticeably faster due to using ranged queries
(partial index ranged matches): 

(SELECT ... 
FROM ... 
WHERE indexfield  16 
) UNION ( 
SELECT ... 
FROM ... 
WHERE indexfield  16 
) 

Of course, if you have to do a table scan ANYWAY (because a value you
have in a constraint is not in an index) then this won't help. This
optimization is all about getting the engine to use an index whenever it
can instead of performing a table scan. Of course, that is not always
possible. 

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine 

Robert DiFalco [EMAIL PROTECTED] wrote on 03/13/2006 10:48:29
AM:

 Shawn,
  
 Any performance gains for specifying type  0 than type  0 ?
  
 R.
 
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Monday, March 13, 2006 6:37 AM
 To: Robert DiFalco
 Cc: mysql@lists.mysql.com
 Subject: Re: Query Optimization Question
 
 
 
 
 Robert DiFalco [EMAIL PROTECTED] wrote on 03/11/2006 12:43:43
 PM:
 
  In a previous database engine I was using an IN was more optimal
than
 a
  . So, for example:
   
  SELECT * FROM table WHERE table.type IN (1,2,3);
   
  Where the possible values of type are 0-3, was appreciably faster
 than:
   
  SELECT * FROM table WHERE table.type  0;
   
  I've been playing with the Query Browser and checking out the
  optimization documents and haven't been able to make a clear call on
  whether or not this is also the case with MySQL/InnoDB.
   
  TIA,
   
  R.
   
  
 
 YES, YES, YES! This is definitely an optimization.   
 
 When you say IN or =, you are asking for matching values. Matches
can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full
index
 scan which is still less efficient than  ranged or values-based
lookups.
 
 
 It's when you get into the situation where you are matching against
 dozens of IN-clause items that you may run into slowdowns again. Until
 you reach  2 or 3 dozen terms (depending on your hardware) you should
be
 faster with an IN comparison than a  or a NOT IN comparison. An
 optimization to search for BUT a term or two is to create a temporary
 table of all of your terms and delete the exact ones you want to
 exclude. Put an index on yoru temp table then JOIN that back into your
 query again (replacing the huge IN clause).  The database will match
 index to index and things will get fast again. This technique can
scale
 up to some really big queries. 
 
 Always try to code for the affirmative tests. Your users will thank
you.
 
 
 Shawn Green
 Database Administrator
 Unimin Corporation - Spruce Pine
 



Re: Query Optimization Question

2006-03-13 Thread Michael Stassen

Robert DiFalco wrote:
 In a previous database engine I was using an IN was more optimal than a
 . So, for example:

 SELECT * FROM table WHERE table.type IN (1,2,3);

 Where the possible values of type are 0-3, was appreciably faster than:

 SELECT * FROM table WHERE table.type  0;

 I've been playing with the Query Browser and checking out the
 optimization documents and haven't been able to make a clear call on
 whether or not this is also the case with MySQL/InnoDB.

Mladen Adamovic wrote:
 IN should be faster implemented with both hash tables and BTREE's so
 nowadays it should also be faster than  as all MySQL implementation
 AFAIK use those well known data structures for indexes.

[EMAIL PROTECTED] wrote:
 YES, YES, YES! This is definitely an optimization.

 When you say IN or =, you are asking for matching values. Matches can
 come from indexes. When you say  or NOT IN, you are asking for
 everything BUT matches.  In order to evaluate a negative, the database
 engine (and this is usually true regardless of database server) almost
 always performs a full table scan to test every row to make sure it is
 either  or NOT IN. At the very best, they have to perform a full index
 scan which is still less efficient than  ranged or values-based lookups.

 It's when you get into the situation where you are matching against dozens
 of IN-clause items that you may run into slowdowns again. Until you reach
 2 or 3 dozen terms (depending on your hardware) you should be faster with
 an IN comparison than a  or a NOT IN comparison. An optimization to
 search for BUT a term or two is to create a temporary table of all of your
 terms and delete the exact ones you want to exclude. Put an index on your
 temp table then JOIN that back into your query again (replacing the huge
 IN clause).  The database will match index to index and things will get
 fast again. This technique can scale up to some really big queries.

 Always try to code for the affirmative tests. Your users will thank you.

Implicit in Mladen and Shawn's answers, but never actually mentioned in the 
original post, is the presence of an index on the type column.  This is probably 
obvious to all concerned, but I mention it for completeness: without an index on 
type, there is no difference between type IN (1,2,3) and type != 0.  That 
is, the question is not whether IN is better than !=, but rather which will 
allow the optimizer to make good use of the index on type.


I find mysql's optimizer is pretty good with well-written queries, as long as 
subqueries aren't involved, so my initial reaction was to expect no difference. 
 After all, as the optimizer considers the WHERE conditions and the available 
indexes, it is certainly possible, at least theoretically, for it to notice that 
type IN (1,2,3) and type != 0 are identical conditions.  That is, a clever 
optimizer could treat them identically.  Shawn's and Mladen's answers gave me 
pause, however, and aroused my curiosity, so I decided to test:


  SELECT VERSION();
  +---+
  | VERSION() |
  +---+
  | 4.1.15|
  +---+

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +--+--+
  | type | COUNT(*) |
  +--+--+
  |0 |44224 |
  |1 | 1919 |
  |2 | 1931 |
  |3 | 1926 |
  +--+--+

  mysql EXPLAIN SELECT * FROM inits WHERE cat IN (1,2,3) \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8117
  Extra: Using where

  mysql EXPLAIN SELECT * FROM inits WHERE cat != 0 \G
  *** 1. row ***
 id: 1
select_type: SIMPLE
  table: inits
   type: range
  possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
   rows: 8120
  Extra: Using where

As you can see, the optimizer plans to use the index in both cases, examining 
8117 rows in the IN case and 8120 rows in the != case, to get 5776 matching rows 
out of 50,000 (about 11.55%).


On the other hand, it makes a difference how many rows will match.  What is the 
distribution of values of type?  If the number of matching rows is more than 
about 30% of the table, the optimizer won't use an available index in any case. 
 For example,


mysql EXPLAIN SELECT * FROM inits WHERE cat IN (0,1,2) \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: inits
 type: ALL
possible_keys: cat_idx
  key: NULL
  key_len: NULL
  ref: NULL
 rows: 5
Extra: Using where

mysql EXPLAIN SELECT * FROM inits WHERE cat !=3 \G
*** 1. row ***
   id: 1
  select_type: SIMPLE
table: 

Query Optimization Question

2006-03-11 Thread Robert DiFalco
In a previous database engine I was using an IN was more optimal than a
. So, for example:
 
SELECT * FROM table WHERE table.type IN (1,2,3);
 
Where the possible values of type are 0-3, was appreciably faster than:
 
SELECT * FROM table WHERE table.type  0;
 
I've been playing with the Query Browser and checking out the
optimization documents and haven't been able to make a clear call on
whether or not this is also the case with MySQL/InnoDB.
 
TIA,
 
R.
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query Optimization Question

2006-03-11 Thread Mladen Adamovic

Robert DiFalco wrote:

In a previous database engine I was using an IN was more optimal than a
. So, for example:
 
SELECT * FROM table WHERE table.type IN (1,2,3);
 
Where the possible values of type are 0-3, was appreciably faster than:
 
SELECT * FROM table WHERE table.type  0;
  
IN should be faster implemented with both hash tables and BTREE's so 
nowadays it should also be faster than  as all MySQL implementation 
AFAIK use those well known data structures for indexes.



 
I've been playing with the Query Browser and checking out the

optimization documents and haven't been able to make a clear call on
whether or not this is also the case with MySQL/InnoDB.
 
TIA,
 
R.
 



  


--
Mladen Adamovic
http://home.blic.net/adamm
http://www.shortopedia.com 
http://www.froola.com 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Help with Query Optimization

2006-02-28 Thread Kishore Jalleda
Hi All,
 We use Vbulletin for our message boards, and I have a query which
takes approximately 1.2 - 1.5 seconds to execute, on a table ('thread')
having around 130,000 rows, it is as follows

SELECT threadid
FROM thread AS thread
LEFT JOIN deletionlog AS deletionlog ON ( thread.threadid = deletionlog.
primaryid
AND TYPE = 'thread' )
WHERE forumid =98
AND sticky =0
AND visible =1
AND deletionlog.primaryid IS NULL
ORDER BY sticky DESC , lastpost DESC
LIMIT 0 , 15

There is an index on the field 'lastpost' ( Type: INT(10) ) , also Explain
Select gives me this..

   id
 select_type
 table
 type
 possible_keys
 key
 key_len
 ref
 rows
 Extra
 1
 SIMPLE
 thread
 index
 *NULL*
 lastpost
 4
 *NULL*
 112783
 Using where
 1
 SIMPLE
 deletionlog
 eq_ref
 PRIMARY
 PRIMARY
 5
 foxbox16.thread.threadid,const
 1
 Using where; Using index; Not exists


I am no SQL guru so can somebody kindly help me to optimize this query so
that it could run faster , as during peak times this slows the DB down a
lot..

Kishore Jalleda


Re: Help with Query Optimization

2006-02-28 Thread Peter Brawley




Kishore,

 We use Vbulletin for our message boards, and I have a
query which
takes approximately 1.2 - 1.5 seconds to execute, on a table
('thread')
having around 130,000 rows, it is as follows

SELECT threadid
FROM thread 
LEFT JOIN deletionlog
ON ( thread.threadid = deletionlog.primaryid AND TYPE = 'thread' )
WHERE forumid = 98
 AND sticky = 0
 AND visible = 1
 AND deletionlog.primaryid IS NULL
ORDER BY sticky DESC , lastpost DESC
LIMIT 0 , 15

Your formatting of EXPLAIN output is a bit hard to read, but did you
try indexes on 
(sticky,lastpost) and (forumid,sticky,visible)? And depending on how
big deletionlog is, 
you might move ...
 AND deletionlog.primaryid IS NULL
from the WHERE clause to the ON clause, to reduce the number of rows
the query engine 
has to write to its temp table.

PB

-

Kishore Jalleda wrote:

  Hi All,
 We use Vbulletin for our message boards, and I have a query which
takes approximately 1.2 - 1.5 seconds to execute, on a table ('thread')
having around 130,000 rows, it is as follows

SELECT threadid
FROM thread AS thread
LEFT JOIN deletionlog AS deletionlog ON ( thread.threadid = deletionlog.
primaryid
AND TYPE = 'thread' )
WHERE forumid =98
AND sticky =0
AND visible =1
AND deletionlog.primaryid IS NULL
ORDER BY sticky DESC , lastpost DESC
LIMIT 0 , 15

There is an index on the field 'lastpost' ( Type: INT(10) ) , also Explain
Select gives me this..

   id
 select_type
 table
 type
 possible_keys
 key
 key_len
 ref
 rows
 Extra
 1
 SIMPLE
 thread
 index
 *NULL*
 lastpost
 4
 *NULL*
 112783
 Using where
 1
 SIMPLE
 deletionlog
 eq_ref
 PRIMARY
 PRIMARY
 5
 foxbox16.thread.threadid,const
 1
 Using where; Using index; Not exists


I am no SQL guru so can somebody kindly help me to optimize this query so
that it could run faster , as during peak times this slows the DB down a
lot..

Kishore Jalleda

  
  

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.1.0/269 - Release Date: 2/24/2006
  



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 268.1.0/269 - Release Date: 2/24/2006


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

Help with query optimization query SUM

2006-02-06 Thread Reynier Perez Mira
Hi:
I'm develop a simple shopping cart. I have this two tables:
carro_de_compras
--
IDU int(11) NOT NULL
IDA int(11) NOT NULL
CantidadDeArticulos int(11) NOT NULL

os_articulo
--
IDA int(11) NOT NULL auto_increment,
IDC int(11) NOT NULL default '0',
ANombre varchar(200) NOT NULL default '', ADescripcion text, ACantidad int(11) 
NOT NULL default '0', AImagen varchar(50) default NULL, IDU int(11) NOT NULL 
default '0', APrecio float(6,2) default NULL, KEY AI_IDA (`IDA`)

Before ask let me explain some things. As you can see in the tables I have the 
same field IDU in both tables. So in first(table carro_de_compras) it means is 
user ID loged on ecommerce system, the second is the user ID who upload 
articles for sale. Something like eBay in wich you can sale and buy at every 
time. The arrive the point in wich I need to optimize queries:

PHP Code:
-
$sql = mysql_query(SELECT * FROM carro_de_compras); 
$sresultado = mysql_fetch_assoc($sql);

$query = mysql_query(SELECT * FROM os_articulo WHERE 
(IDA='.$sresultado['IDA'].')); 
while ($record = mysql_fetch_assoc($query)) {  
 $productos[] = $record; 
}

The question for this problem is: exists any way to optimize this query and 
leave only in one line? I read in MySQL doc about it and found some about JOIN 
but I can't understand how it works. Maybe because I'm cuban and not understand 
english as well as I want.

The other questions is how to add some values to a field. For example:
$sql = mysql_query(UPDATE table SET value=value+1 WHERE id='1');

For do this query I do this:
$sql = mysql_query(SELECT value FROM table WHERE id='1');
$result = mysql_query($sql);
$update = mysql_query(UPDATE table SET (value='.$result['value'].' + 1) 
WHERE id='1');

So is possible to optimize this query?


Regards
ReynierPM
4to. año Ing. Informática
Usuario registrado de Linux: #310201
*
El programador superhéroe aprende de compartir sus conocimientos. 
Es el referente de sus compañeros. Todo el mundo va a preguntarle y él, 
secretamente, lo fomenta porque es así como adquiere su legendaria
sabiduría: escuchando ayudando a los demás... 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Help with query optimization query SUM

2006-02-06 Thread سيد هادی راستگوی حقی
Dear Reynier,

You can use JOIN on your both,
The JOIN have to run on the same feilds i.e IDA.

SELECT * FROM carro_de_compras LEFT JOIN os_articulo ON carro_de_compras.IDA
= os_articulo.IDA

This query returns all your users with their articles if any and you can
iterate on it.

but one note:
Use INDEX on both tables. You may encounter problems when your rows grow up.

about the UPDATE query:

UPDATE table SET value=value+1 WHERE id='1'

is enough, use that.


On 2/7/06, Reynier Perez Mira [EMAIL PROTECTED] wrote:

 Hi:
 I'm develop a simple shopping cart. I have this two tables:
 carro_de_compras
 --
 IDU int(11) NOT NULL
 IDA int(11) NOT NULL
 CantidadDeArticulos int(11) NOT NULL

 os_articulo
 --
 IDA int(11) NOT NULL auto_increment,
 IDC int(11) NOT NULL default '0',
 ANombre varchar(200) NOT NULL default '',

ADescripcion text,

ACantidad int(11) NOT NULL default '0',

AImagen varchar(50) default NULL,

IDU int(11) NOT NULL default '0',

APrecio float(6,2) default NULL,

KEY AI_IDA (`IDA`)

 Before ask let me explain some things. As you can see in the tables I have
 the same field IDU in both tables. So in first(table carro_de_compras) it
 means is user ID loged on ecommerce system, the second is the user ID who
 upload articles for sale. Something like eBay in wich you can sale and buy
 at every time. The arrive the point in wich I need to optimize queries:

 PHP Code:
 -
 $sql = mysql_query(SELECT * FROM carro_de_compras);
 $sresultado = mysql_fetch_assoc($sql);

 $query = mysql_query(SELECT * FROM os_articulo WHERE
 (IDA='.$sresultado['IDA'].'));
 while ($record = mysql_fetch_assoc($query)) {
 $productos[] = $record;
 }

 The question for this problem is: exists any way to optimize this query
 and leave only in one line? I read in MySQL doc about it and found some
 about JOIN but I can't understand how it works. Maybe because I'm cuban and
 not understand english as well as I want.

 The other questions is how to add some values to a field. For example:
 $sql = mysql_query(UPDATE table SET value=value+1 WHERE id='1');

 For do this query I do this:
 $sql = mysql_query(SELECT value FROM table WHERE id='1');
 $result = mysql_query($sql);
 $update = mysql_query(UPDATE table SET (value='.$result['value'].' + 1)
 WHERE id='1');

 So is possible to optimize this query?


 Regards
 ReynierPM
 4to. año Ing. Informática
 Usuario registrado de Linux: #310201
 *
 El programador superhéroe aprende de compartir sus conocimientos.
 Es el referente de sus compañeros. Todo el mundo va a preguntarle y él,
 secretamente, lo fomenta porque es así como adquiere su legendaria
 sabiduría: escuchando ayudando a los demás...

 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/[EMAIL PROTECTED]




--
Sincerely,
Hadi Rastgou
a href=http://www.spreadfirefox.com/?q=affiliatesamp;id=0amp;t=1;Get
Firefox!/a


Re: Query optimization

2006-01-13 Thread Joerg Bruehe

Hi Tripp, all!


Tripp Bishop wrote:

[[...]]

Why would adding extra fields to the query slow the
query down so much? In either case it looks like it's
using the same index.


If a query is to return only fields contained in the index, it can be 
executed as an index-only query, and the true data need not be accessed.


The moment you ask for a field not contained in the index, for each 
index entry matching the search condition, the base data need to be 
accessed to get that field.


This may be very significant additional costs.

HTH,
Joerg


PS:
In my reader, it looked like your original question hijacked another 
thread: Question regarding running 'mysql' from a bash script.

:-(

--
Joerg Bruehe, Senior Production Engineer
MySQL AB, www.mysql.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Mysql Query Optimization?

2006-01-13 Thread Scott Baker
I have a query that's taking about 0.3 seconds to run and it should
be faster I think. I've read all about query optimization and
forcing indexes and I'm not entirely sure what I'm having this
problem. Google finds all kinds of basic information like using
indexes, and not doing math on the query fields. Stupid stuff I'm
not doing.

I have two tables EmailAddr and CustInfo. That look like this:

+--+---+--+-+-+
| Field| Type  | Null | Key | Default |
+--+---+--+-+-+
| EmailID  | mediumint(8) unsigned |  | PRI | NULL|
| EmailUsername| varchar(25)   | YES  | MUL | NULL|

| EmailPassword| varchar(50)   | YES  | | NULL|
| CustID   | mediumint(8) unsigned |  | MUL | 0   |

+--+---+--+-+-+

And

+-+---+--+-+---+
| Field   | Type  | Null | Key | Default   |
+-+---+--+-+---+
| CustID  | mediumint(8)  |  | PRI | 0 |
| CustAddress | varchar(150)  | YES  | | NULL  |
| CustCity| varchar(50)   | YES  | | NULL  |
| CustState   | varchar(50)   | YES  | | NULL  |
| etc...   |
+-+---+--+-+---+

Where I select all the records from either table it's 0.02 seconds.
There are about 10,000 records in each table. If I try and join the
Email table and CustInfo table on CustID (which is in indexed field
in both tables) the query time jumps to 0.3 seconds.

SELECT EmailID, e.CustID
FROM EmailAddr e, CustInfo c
WHERE e.CustID = c.CustID;

If I tell it to use the index primary it's fast (0.02 seconds)

SELECT EmailID, e.CustID
FROM EmailAddr e USE INDEX (Primary), CustInfo c
WHERE e.CustID = c.CustID;

#1) Why can't it find that relationship itself...
#2) Why does telling it to use the Primary key for EmailAddr help at
all!?! It's not even a query/join field.

Scott

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mysql Query Optimization?

2006-01-13 Thread Gleb Paharenko
Hello.

Please, provide the output of EXPLAIN for your queries. The performance
should improved if you make e.CustID and c.CustID the same type (one of
them is unsigned). Have you run ANALYZE TABLE on your tables?


Scott Baker wrote:
 I have a query that's taking about 0.3 seconds to run and it should
 be faster I think. I've read all about query optimization and
 forcing indexes and I'm not entirely sure what I'm having this
 problem. Google finds all kinds of basic information like using
 indexes, and not doing math on the query fields. Stupid stuff I'm
 not doing.
 
 I have two tables EmailAddr and CustInfo. That look like this:
 
 +--+---+--+-+-+
 | Field| Type  | Null | Key | Default |
 +--+---+--+-+-+
 | EmailID  | mediumint(8) unsigned |  | PRI | NULL|
 | EmailUsername| varchar(25)   | YES  | MUL | NULL|
 
 | EmailPassword| varchar(50)   | YES  | | NULL|
 | CustID   | mediumint(8) unsigned |  | MUL | 0   |
 
 +--+---+--+-+-+
 
 And
 
 +-+---+--+-+---+
 | Field   | Type  | Null | Key | Default   |
 +-+---+--+-+---+
 | CustID  | mediumint(8)  |  | PRI | 0 |
 | CustAddress | varchar(150)  | YES  | | NULL  |
 | CustCity| varchar(50)   | YES  | | NULL  |
 | CustState   | varchar(50)   | YES  | | NULL  |
 | etc...   |
 +-+---+--+-+---+
 
 Where I select all the records from either table it's 0.02 seconds.
 There are about 10,000 records in each table. If I try and join the
 Email table and CustInfo table on CustID (which is in indexed field
 in both tables) the query time jumps to 0.3 seconds.
 
 SELECT EmailID, e.CustID
 FROM EmailAddr e, CustInfo c
 WHERE e.CustID = c.CustID;
 
 If I tell it to use the index primary it's fast (0.02 seconds)
 
 SELECT EmailID, e.CustID
 FROM EmailAddr e USE INDEX (Primary), CustInfo c
 WHERE e.CustID = c.CustID;
 
 #1) Why can't it find that relationship itself...
 #2) Why does telling it to use the Primary key for EmailAddr help at
 all!?! It's not even a query/join field.
 
 Scott


-- 
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.NET http://www.ensita.net/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Gleb Paharenko
 / /|_/ / // /\ \/ /_/ / /__   [EMAIL PROTECTED]
/_/  /_/\_, /___/\___\_\___/   MySQL AB / Ensita.NET
   ___/   www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Mysql Query Optimization?

2006-01-13 Thread Scott Baker
It looks like I spoke too soon. That field fix sped up the straight
join, but not the more complex query I had, it's still 0.3 seconds.

SELECT EmailID, EmailUsername, d.DomainName, e.CustID
FROM EmailAddr e, Domain d, CustInfo c
WHERE e.CustID = c.CustID AND e.DomainID = d.DomainID AND
(EmailUsername LIKE '%thompson%')
LIMIT 10;

This explains out like this

Slow (0.3 seconds)
+---++-+-+-++--+-+
| table | type   | possible_keys   | key | key_len | ref
| rows | Extra   |
+---++-+-+-++--+-+
| c | index  | PRIMARY | PRIMARY |   3 | [NULL]
| 9371 | Using index |
| e | ref| DomainID,CustID | CustID  |   3 | c.CustID
|1 | Using where |
| d | eq_ref | PRIMARY | PRIMARY |   2 | e.DomainID
|1 | |
+---++-+-+-++--+-+

However if I force the index on EmailAddr:

SELECT EmailID, EmailUsername, d.DomainName, e.CustID
FROM EmailAddr e USE INDEX(Primary), Domain d, CustInfo c
WHERE e.CustID = c.CustID AND e.DomainID = d.DomainID AND
(EmailUsername LIKE '%thompson%')
LIMIT 10;

It's fast (0.02 seconds). So it's like Mysql isn't putting the
tables in the right order. Does it have to do with the order I put
the fields/tables in the query? The best way would be for mysql to
do a full table scan for anything that matches that LIKE query...
which should return 8 records, and then join on the CustID. As
opposed to doing the join first and joining 10,000 records, and then
grepping all those.

Fast (0.02 seconds)
+---++---+-+-++---+-+
| table | type   | possible_keys | key | key_len | ref|
rows  | Extra   |
+---++---+-+-++---+-+
| e | ALL| [NULL]| [NULL]  |  [NULL] | [NULL] |
12413 | Using where |
| d | eq_ref | PRIMARY   | PRIMARY |   2 | e.DomainID |
1 | |
| c | eq_ref | PRIMARY   | PRIMARY |   3 | e.CustID   |
1 | Using index |
+---++---+-+-++---+-+

Scott

Gleb Paharenko wrote:
 Hello.
 
 Please, provide the output of EXPLAIN for your queries. The performance
 should improved if you make e.CustID and c.CustID the same type (one of
 them is unsigned). Have you run ANALYZE TABLE on your tables?
 
 
 Scott Baker wrote:
 I have a query that's taking about 0.3 seconds to run and it should
 be faster I think. I've read all about query optimization and
 forcing indexes and I'm not entirely sure what I'm having this
 problem. Google finds all kinds of basic information like using
 indexes, and not doing math on the query fields. Stupid stuff I'm
 not doing.

 I have two tables EmailAddr and CustInfo. That look like this:

 +--+---+--+-+-+
 | Field| Type  | Null | Key | Default |
 +--+---+--+-+-+
 | EmailID  | mediumint(8) unsigned |  | PRI | NULL|
 | EmailUsername| varchar(25)   | YES  | MUL | NULL|

 | EmailPassword| varchar(50)   | YES  | | NULL|
 | CustID   | mediumint(8) unsigned |  | MUL | 0   |

 +--+---+--+-+-+

 And

 +-+---+--+-+---+
 | Field   | Type  | Null | Key | Default   |
 +-+---+--+-+---+
 | CustID  | mediumint(8)  |  | PRI | 0 |
 | CustAddress | varchar(150)  | YES  | | NULL  |
 | CustCity| varchar(50)   | YES  | | NULL  |
 | CustState   | varchar(50)   | YES  | | NULL  |
 | etc...   |
 +-+---+--+-+---+

 Where I select all the records from either table it's 0.02 seconds.
 There are about 10,000 records in each table. If I try and join the
 Email table and CustInfo table on CustID (which is in indexed field
 in both tables) the query time jumps to 0.3 seconds.

 SELECT EmailID, e.CustID
 FROM EmailAddr e, CustInfo c
 WHERE e.CustID = c.CustID;

 If I tell it to use the index primary it's fast (0.02 seconds)

 SELECT EmailID, e.CustID
 FROM EmailAddr e USE INDEX (Primary), CustInfo c
 WHERE e.CustID = c.CustID;

 #1) Why can't it find that relationship itself...
 #2) Why does telling it to use the Primary key for EmailAddr help at
 all!?! It's not even a query/join field.

 Scott
 
 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http

Re: Mysql Query Optimization?

2006-01-13 Thread Scott Baker
Ops, forgot to reply to this list

Gleb:

+---+---+---+-+-+--+--+-+
| table | type  | possible_keys | key | key_len | ref  |
rows | Extra   |
+---+---+---+-+-+--+--+-+
| c | index | PRIMARY   | PRIMARY |   3 | [NULL]   |
9371 | Using index |
| e | ref   | CustID| CustID  |   3 | c.CustID |
1 | |
+---+---+---+-+-+--+--+-+

Hopefully that won't come across too funky for you. Anyway I ended
up changing the CustID field in the EmailAddr table to NOT be
unsigned (so both the field types are the same) and it came back
immediately. So it appears from my preliminary tests that the
problem was that the field types didn't match.

What a weird problem, I never would have thought of that. Good catch!

Scott

Gleb Paharenko wrote:
 Hello.
 
 Please, provide the output of EXPLAIN for your queries. The performance
 should improved if you make e.CustID and c.CustID the same type (one of
 them is unsigned). Have you run ANALYZE TABLE on your tables?
 
 
 Scott Baker wrote:
 I have a query that's taking about 0.3 seconds to run and it should
 be faster I think. I've read all about query optimization and
 forcing indexes and I'm not entirely sure what I'm having this
 problem. Google finds all kinds of basic information like using
 indexes, and not doing math on the query fields. Stupid stuff I'm
 not doing.

 I have two tables EmailAddr and CustInfo. That look like this:

 +--+---+--+-+-+
 | Field| Type  | Null | Key | Default |
 +--+---+--+-+-+
 | EmailID  | mediumint(8) unsigned |  | PRI | NULL|
 | EmailUsername| varchar(25)   | YES  | MUL | NULL|

 | EmailPassword| varchar(50)   | YES  | | NULL|
 | CustID   | mediumint(8) unsigned |  | MUL | 0   |

 +--+---+--+-+-+

 And

 +-+---+--+-+---+
 | Field   | Type  | Null | Key | Default   |
 +-+---+--+-+---+
 | CustID  | mediumint(8)  |  | PRI | 0 |
 | CustAddress | varchar(150)  | YES  | | NULL  |
 | CustCity| varchar(50)   | YES  | | NULL  |
 | CustState   | varchar(50)   | YES  | | NULL  |
 | etc...   |
 +-+---+--+-+---+

 Where I select all the records from either table it's 0.02 seconds.
 There are about 10,000 records in each table. If I try and join the
 Email table and CustInfo table on CustID (which is in indexed field
 in both tables) the query time jumps to 0.3 seconds.

 SELECT EmailID, e.CustID
 FROM EmailAddr e, CustInfo c
 WHERE e.CustID = c.CustID;

 If I tell it to use the index primary it's fast (0.02 seconds)

 SELECT EmailID, e.CustID
 FROM EmailAddr e USE INDEX (Primary), CustInfo c
 WHERE e.CustID = c.CustID;

 #1) Why can't it find that relationship itself...
 #2) Why does telling it to use the Primary key for EmailAddr help at
 all!?! It's not even a query/join field.

 Scott
 
 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Query optimization

2006-01-12 Thread Tripp Bishop
Howdy all,

I've got a question regarding optimizing a query.

I've got two tables:

listing and listing_city_xref

listing is a pretty wide table with many fields that
can be queried by users. There is a subset of fields
that are always in queries of this table but there are
many additional fields that may or maynot be included.
The primary key consists of two fields, group_id int
and listing_id varchar(30).

The listing_city_xref table is very simple. It
consists of 3 fields: group_id, listing_id, city_id.
All three fields form the PK. There is an index on
group_id and listing_id as well as a foreign key that
points back to the listing table consisting of these
two fields.

I created an index on the listing table that contains
the group_id, listing_id, and the other fields that
all searches use.

When I run a query that joins the two tables on the
group_id and listing_id and I search only fields that
are included in the index that I mentioned the query
is really fast. If I add another field that's not
included in the index the query slows down by a factor
of 100. I can't simply index every field in the
listing table so what can I do? I don't understand why
added extra critieria to the query destroys its
performance. Any ideas?

Thanks,

Tripp

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query optimization

2006-01-12 Thread James Harvard
It might be a good idea if you could post the results of an EXPLAIN SELECT ... 
for a fast query and a slow query along with their actual SQL statements.

James Harvard

At 11:37 am -0800 12/1/06, Tripp Bishop wrote:
I've got a question regarding optimizing a query.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Query optimization

2006-01-12 Thread Tripp Bishop
Jim, here's an example query:

SELECT COUNT(listing.listing_id) AS listing_count FROM
listing_city_xref INNER JOIN listing
ON listing.listing_id = listing_city_xref.listing_id
AND listing.group_id = listing_city_xref.group_id
WHERE listing_city_xref.group_id =2  
AND listing_city_xref.city_id IN (83,88) AND
listing.price BETWEEN 189000.00 AND 23.00 
AND tot_finished_sqft = 1100 AND sqft = 1000 AND
baths = 1 AND bedrooms = 3 AND garage_spaces = 1
AND property_type IN ('RES'));


I created an index on the following fields in the
listing table:
group_id, price, tot_finished_sqft, baths, bedrooms,
garage_spaces, property_type.

Here's the result of the explain on the query above:

|  1 | SIMPLE  | listing   | ref  |
PRIMARY, idx_search | idx_search | 4 | const | 8708 |
Using where|
|  1 | SIMPLE  | listing_wiz_city_xref | ref  |
PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36|
const, ahf_test.listing.listing_id |1 | Using
where; Using index |

This query took 12.91 seconds. Running it multiple
times produces similar run times.

Now, if I remove the sqft = 1100 condition from the
where clause I get the following explain output:

|  1 | SIMPLE  | listing | range | PRIMARY,
idx_search | idx_search | 15  | NULL | 8688 |
Using where; Using index |
|  1 | SIMPLE  | listing_wiz_city_xref | ref   |
PRIMARY, idx_listing, idx_wiz_city | PRIMARY| 36  
   | const, ahf_test.listing.listing_id |1 | Using
where; Using index |

The query executes in 0.09 seconds. Running it
multiple times produces similar runtimes.

One interesting difference between the explain outputs
is that in the second case the ref is NULL. In the
first scenario it is const.

Why would adding extra fields to the query slow the
query down so much? In either case it looks like it's
using the same index.

Thanks,

Tripp


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Doubt about query optimization

2005-07-28 Thread Ronan Lucio

Eric,

Can you send us the actual show indexes from table and explain output that 
isn't shortend?


Thank you for answer my question.

Actually, the real query and tables are much more complex than
the data in the previous message.
A just tryed to optimize the information for you better understand the 
trouble.


I think found what´s happening.
A SELECT WHERE city = 1 returns more rows thant a full table scan
in the table front (the real states that appear in the front page).
So, it seems MySQL choose the second option, once it has less rows
to optimize.

Thanks,
Ronan 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Doubt about query optimization

2005-07-27 Thread Ronan Lucio

Hello,

I´m using MySQL-4.1.7, and I have a query that seems,
even using where clause, an specific table isn´t being optimized
properly:

I have four tables:

real_state: cod, name, city, ag_cod, sell
agency: cod, name, ag_status
front: cod, rs_cod, ag_cod
photo: cod, rs_cod

These table have the following indexes:

real_state:
   cod (pri)
   city
   ag_cod
agency:
   cod
   name
front:
   cod
   rs_cod
   ag_cod
photo
   cod
   rs_cod

When I EXPLAIN the query:

EXPLAIN
SELECT front.rs_cod, photo.cod, real_state.descr
FROM real_state, agency, front, photo_foto
WHERE real_state.city = 1
AND real_state.sell = 1
AND front.rs_cod = real_state.cod
AND photo.rs_cod = front.rs_cod
AND agency.cod = real_state.ag_cod
AND ag_status = 'A'

It shows me (in a short):

tablekeyrows
======  
frontrs_cod   2085
real_statecod1
agencycod1
photo  rs_cod   1

But the trouble is: If I execute:

SELECT COUNT(*)
FROM real_state, front
WHERE real_state.city = 1
AND real_state.cod = front.rs_cod

Returns: 271

So, Why the index front.rs_cod isn´t being used?

Any help would be appreciated,

thank you,
Ronan 




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Doubt about query optimization

2005-07-27 Thread Eric Bergen
Can you send us the actual show indexes from table and explain output 
that isn't shortend?


Ronan Lucio wrote:


Hello,

I´m using MySQL-4.1.7, and I have a query that seems,
even using where clause, an specific table isn´t being optimized
properly:

I have four tables:

real_state: cod, name, city, ag_cod, sell
agency: cod, name, ag_status
front: cod, rs_cod, ag_cod
photo: cod, rs_cod

These table have the following indexes:

real_state:
   cod (pri)
   city
   ag_cod
agency:
   cod
   name
front:
   cod
   rs_cod
   ag_cod
photo
   cod
   rs_cod

When I EXPLAIN the query:

EXPLAIN
SELECT front.rs_cod, photo.cod, real_state.descr
FROM real_state, agency, front, photo_foto
WHERE real_state.city = 1
AND real_state.sell = 1
AND front.rs_cod = real_state.cod
AND photo.rs_cod = front.rs_cod
AND agency.cod = real_state.ag_cod
AND ag_status = 'A'

It shows me (in a short):

tablekeyrows
======  
frontrs_cod   2085
real_statecod1
agencycod1
photo  rs_cod   1

But the trouble is: If I execute:

SELECT COUNT(*)
FROM real_state, front
WHERE real_state.city = 1
AND real_state.cod = front.rs_cod

Returns: 271

So, Why the index front.rs_cod isn´t being used?

Any help would be appreciated,

thank you,
Ronan





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Group By query optimization

2005-07-15 Thread Kishore Jalleda
Hi All,
  I have a mysql query which takes 8 seconds to run ona dual
xeon 2.4, 3Gig ram box,
SELECT gamename, MAX(score) AS score, COUNT(valid=1) AS played FROM
gamesessions AS gamesessions
WHERE valid=1 AND sessiontype IN (1,2)
GROUP BY gamename;
Explain select gives 

table   typepossible_keys   key key_len ref rowsExtra
gamesessionsALL NULLNULLNULLNULL915522  Using where; 
Using
temporary; Using filesort

How can I optimize the query 
Thanks 
Kishore

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Group By query optimization

2005-07-15 Thread Andrew Braithwaite
Hi,

Put indexes on 'valid' and 'sessiontype' and all will be good.

Cheers,

Andrew


On 15/7/05 18:26, Kishore Jalleda [EMAIL PROTECTED] wrote:

 Hi All,
   I have a mysql query which takes 8 seconds to run ona dual
 xeon 2.4, 3Gig ram box,
 SELECT gamename, MAX(score) AS score, COUNT(valid=1) AS played FROM
 gamesessions AS gamesessions
 WHERE valid=1 AND sessiontype IN (1,2)
 GROUP BY gamename;
 Explain select gives
 
 table  type  possible_keys  key  key_len  ref  rows  Extra
 gamesessions  ALL  NULL  NULL  NULL  NULL  915522  Using where; Using
 temporary; Using filesort
 
 How can I optimize the query
 Thanks 
 Kishore
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Select Query Optimization - 1 minute long :(

2005-07-12 Thread Scott Klarenbach
The following query is in desperate need of optimization.  Any gurus
out there who can share some insights, I'd greatly appreciate it.

I have a request table, from which I'm pulling all records.  Each
request record has a part number, and for each requested part number,
I'd like to query the inventory table for all matching inventory items
for that part number, and calculate the potential value of the
inventory.  Each inventory record has 4 price fields, and it is the
HIGHEST value of these fields used in the calculation.  As you can
tell by running the query, it is incredibly expensive.  We have
potentially millions of inventory records and 10,000+ requests, so the
query is going to take several minutes to complete.

Also, I found that due to the joining on each record(?), I couldn't
implement a limit clause to save time.  Whether or not a limit clause
is tacked onto the end of the query, it completes the entire
calculation process for both tables, and THEN returns the limited
number of records.  This of course saves me no time.  :(

Any help is greatly appreciated,
Thanks.

ps, all pertinent fields have already been indexed.  This reduced the
query time by half, but half of an eternity is still not that
impressive.  I'm also aware that the use of LIKE in my JOIN is
expensive, but after replacing it with a =, I achieved only modest
performance gains.


SELECT
 r.id,
 r.company,
 r.dateSent,
 r.fullName,
 r.phone,
 r.fax,
 r.email,
 r.address1,
 r.address2,
 r.city,
 r.province,
 r.country,
 r.comments,
 r.partNumber,
 r.description,
 r.dateCode,
 r.qty,
 r.targetPrice,
 r.manufacturer,
 r.expiryDate,
 r.companyType,
 r.yearEstablished,
 r.url,
 r.languages,
 GREATEST(i.distySellCost,
i.originalCost,i.unitCost,i.unitSellCost)*r.qty   AS
'highestValue',
 count(i.id) as 'matches',
 SUM(i.qty) as 'qtyAvailable'
FROM request r
LEFT JOIN inventory i ON ( i.MPN LIKE CONCAT(r.partNumber, '%')
OR i.MPNClean LIKE CONCAT(r.partNumber, '%')) AND i.status=1
WHERE r.deleted=0
GROUP BY r.id;


Any help is appreciated.  Thanks.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Query optimization help needed

2005-02-24 Thread Jesse Sheidlower

I asked for help with a version of this query a few months
ago, and subsequently made some improvements to it, and also
fooled around with other parts of my app that were in need of
tuning. I've since done some more extensive benchmarking and
realized that this query really is somewhat slow. Even though
the data set is rather small and everything is (I think)
properly indexed and the joins are sensible, I can't seem to
get rid of the using temporary and using filesort in my
EXPLAIN. I'd be grateful for any suggestions for improving
this.

Here's the query (obviously I run it with different values for
subject.name and different LIMIT values, but this is
representative):

SELECT citation.*, DATE_FORMAT(citation.last_modified, '%e %M, %Y') AS 
last_modified 
FROM citation, subject, citation_subject
WHERE subject.name = 'History'
AND citation_subject.subject_id = subject.id
AND citation_subject.citation_id = citation.id
AND citation.deleted = 0
ORDER BY citation.stripped_word, FIELD(citation.part_of_speech, 'NOUN', 
'ADJECTIVE', 'ADVERB', 'VERB'), citation.id 
LIMIT 150, 50

and EXPLAIN gives me this:

*** 1. row ***
table: subject
 type: ref
possible_keys: PRIMARY,name
  key: name
  key_len: 50
  ref: const
 rows: 1
Extra: Using where; Using temporary; Using filesort
*** 2. row ***
table: citation_subject
 type: ref
possible_keys: citation_id,subject_id
  key: subject_id
  key_len: 4
  ref: subject.id
 rows: 169
Extra: Using index
*** 3. row ***
table: citation
 type: eq_ref
possible_keys: PRIMARY,deleted
  key: PRIMARY
  key_len: 4
  ref: citation_subject.citation_id
 rows: 1
Extra: Using where

Finally, here are the three tables involved. I've trimmed out the
irrelevant columns:

CREATE TABLE `citation` (
  `id` int(11) NOT NULL auto_increment,
  `word` varchar(50) NOT NULL default '',
  `stripped_word` varchar(50) default NULL,
  `part_of_speech` enum('NOUN','VERB','ADJECTIVE','ADVERB') NOT NULL default 
'NOUN',
  `last_modified` timestamp(14) NOT NULL,
  `deleted` datetime default '-00-00 00:00:00',
  PRIMARY KEY  (`id`),
  KEY `deleted` (`deleted`),
  KEY `word` (`word`),
  KEY `stripped_word` (`stripped_word`)
) TYPE=MyISAM

CREATE TABLE `citation_subject` (
  `id` int(11) NOT NULL auto_increment,
  `citation_id` int(11) NOT NULL default '0',
  `subject_id` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `citation_id` (`citation_id`,`subject_id`),
  KEY `subject_id` (`subject_id`,`citation_id`)
) TYPE=MyISAM 

CREATE TABLE `subject` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(50) NOT NULL default '',
  `deleted` datetime default NULL,
  PRIMARY KEY  (`id`),
  KEY `name` (`name`)
) TYPE=MyISAM 

Thank you for any suggestions.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Doubt about query optimization

2004-12-14 Thread Ronan Lucio
Hello,

Supposing I have 2 tables:

product
==
- id
- name

price
=
- id
- product_id
- sell_price
- rent_price

If I want to make a SELECT for the products having the
rent_price between 300,00 and 500,00, so I use the query:

SELECT rent_price
FROM product
LEFT JOIN price ON (product.id = price.product_id)
WHERE rent_price BETWEEN 300.00 and 500.00

My doubt is if the table product will be optimized.
Will optimization process be done over all rows from the
product table or only the rows related to the price table?

In other words:
If table price has other columns and LEFT JOIN is needed anyway,
even that would be better to move the columns sell_price and rent_price
to the product table?

My table has so many columns, and, for structural and maintainance reasons
it would be better to divide the columns in two tables, but I can´t
compromisse the application performance because of it.

Thanks,
Ronan




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Query optimization question

2004-10-04 Thread Gerald Taylor
Query optimization question
I am  selecting from a single table  but it has a lot of rows and it has 
a very involved calculation.  What I really want to do is
is FIRST restrict the number of rows so that the big calculation is only
performed on the ones that are within 3 degrees.

Using 4.0.20
A sample query  is given here:
The application interpolates variable values such as 44.6 into
the query string, so from mysql's
point of view they are constants, right?  And the explain doc
says it optimizes constants, but  it is looking at all the rows
and I see why.
SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM londata
WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  ORDER 
BY distance;

I guess I can't do a subselect with my version...
If I could what would it look like?
Something like below?   (I might be able to talk
the powers that be into an upgrade.)  And if I can't
is it more horrible to manually create a temporary table
and perform the calculations on it  than it is to
just do what I am doing?
SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM (SELECT * FROM  londata
   WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) 
as
   sublon  ORDER BY distance;

Thanks.
GT
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query optimization question

2004-10-04 Thread gerald_clark

Gerald Taylor wrote:
Query optimization question
I am  selecting from a single table  but it has a lot of rows and it 
has a very involved calculation.  What I really want to do is
is FIRST restrict the number of rows so that the big calculation is only
performed on the ones that are within 3 degrees.

Using 4.0.20
A sample query  is given here:
The application interpolates variable values such as 44.6 into
the query string, so from mysql's
point of view they are constants, right?  And the explain doc
says it optimizes constants, but  it is looking at all the rows
and I see why.
SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM londata
WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  
ORDER BY distance;

I guess I can't do a subselect with my version...
If I could what would it look like?
Something like below?   (I might be able to talk
the powers that be into an upgrade.)  And if I can't
is it more horrible to manually create a temporary table
and perform the calculations on it  than it is to
just do what I am doing?
SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
(SIN(RADIANS(latitude))) +
(COS(RADIANS(44.6))) *
(COS(RADIANS(latitude))) *
(COS(RADIANS(-123.28 -longitude)
 * 111),) as distance  FROM (SELECT * FROM  londata
   WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) as
   sublon  ORDER BY distance;
Thanks.
GT

A Subselect won't help. You still have to scan the whole table to the 
calculation for the where clause.
Pre calculate your min and max lat. and lon.
WHERE  latitude = @maxlat and latitude = @minlat and longitude = 
@maxlon and longitude = @minlon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query optimization question

2004-10-04 Thread SGreen
When you don't have subselects, you have two options: temporary tables or 
JOINed queries.In your case, I think the temporary table is the better way 
to go. 

I would also eliminate the ABS() check so that I can compare values 
directly against the index. I know the math is correct your way but this 
way you are comparing values directly against the column which means that 
indexes can come into play.


SET @targetLat = 44.6, @targetLon = -123.8, @Delta = 3

CREATE TEMPORARY TABLE tmpDeltaData
SELECT city, state, country, latitude, longitude
FROM Londata
WHERE latitude BETWEEN (@[EMAIL PROTECTED]) AND (@targetLat + @Delta)
AND longitude BETWEEN (@targetLon - @Delta) AND (@targetLon + 
@Delta)


SELECT city, state, country, latitude, longitude,
IF(latitude REGEXP '[0-9\\.]+$' AND longitude REGEXP'[0-9\\.]+$',
ROUND(DEGREES(ACOS((SIN(RADIANS(@targetLat))) *
 (SIN(RADIANS(latitude))) +
 (COS(RADIANS(@targetLat))) *
 (COS(RADIANS(latitude))) *
 (COS(RADIANS(@targetLon -longitude)
  * 111),) as distance 
FROM tmpDeltaData
ORDER BY distance

DROP TEMPORARY TABLE tmpDeltaData

I would also test the WHERE clause from tmpDeltaData with your original 
query to compare speeds of the two methods.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Gerald Taylor [EMAIL PROTECTED] wrote on 10/04/2004 10:33:22 AM:

 Query optimization question
 
 I am  selecting from a single table  but it has a lot of rows and it has 

 a very involved calculation.  What I really want to do is
 is FIRST restrict the number of rows so that the big calculation is only
 performed on the ones that are within 3 degrees.
 
 Using 4.0.20
 
 A sample query  is given here:
 The application interpolates variable values such as 44.6 into
 the query string, so from mysql's
 point of view they are constants, right?  And the explain doc
 says it optimizes constants, but  it is looking at all the rows
 and I see why.
 
 SELECT city, state, country, latitude, longitude,
 IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
 ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0  ORDER 

 BY distance;
 
 
 I guess I can't do a subselect with my version...
 If I could what would it look like?
 Something like below?   (I might be able to talk
 the powers that be into an upgrade.)  And if I can't
 is it more horrible to manually create a temporary table
 and perform the calculations on it  than it is to
 just do what I am doing?
 
 SELECT city, state, country, latitude, longitude,
   IF(latitude REGEXP '[0-9]+$' AND longitude REGEXP'[0-9]+$',
   ROUND(DEGREES(ACOS((SIN(RADIANS(44.6))) *
  (SIN(RADIANS(latitude))) +
  (COS(RADIANS(44.6))) *
  (COS(RADIANS(latitude))) *
  (COS(RADIANS(-123.28 -longitude)
   * 111),) as distance  FROM (SELECT * FROM  londata
 WHERE ABS(44.6-latitude) = 3.0 AND ABS(-123.28-longitude) = 3.0) 
 as
 sublon  ORDER BY distance;
 
 Thanks.
 
 GT
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 


Query Optimization

2004-03-13 Thread Kris Burford
Hi,

I'm trying to sort out a query that identifies images that are not in the 
story table (structures below). Peter Brawley has kindly pointed me in the 
right direction with the sql structure (which does work), but I'm getting 
benchmarks of 5+ seconds on the test data, whereas the live site has 50x 
the number of rows and I'm concerned about trying to go live with it.

If anyone could suggest ways to improve the query/table structure, I'd be 
most grateful.

Kris

SELECT i.image_id, i.image_name
FROM table_image i
LEFT JOIN table_story s
USING (mainpic_id)
WHERE s.mainpic_id IS NULL
ORDER by mi.image_name
TABLE IMAGE
image_id int(11) unsigned NOT NULL auto_increment,
  image_name char(64) NOT NULL default 'default',
  PRIMARY KEY  (mainpic_id),
  KEY mainpic_id (mainpic_id)
) TYPE=MyISAM
TABLE STORY
  id int(11) NOT NULL auto_increment,
  body text,
  image_id int(11) unsigned NOT NULL default '0',
  PRIMARY KEY  (id),
  KEY id (id)
) TYPE=MyISAM
And the results from EXPLAIN SELECT:
table = image
type = ALL
possible keys =
key =
key_len =
ref =
rows = 460
extra =
table = story
type = ALL
possible keys =
key =
key_len =
ref =
rows = 610
extra =
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query Optimization [sorted]

2004-03-13 Thread kris burford

I'm trying to sort out a query that identifies images that are not in the 
story table (structures below). Peter Brawley has kindly pointed me in the 
right direction with the sql structure (which does work), but I'm getting 
benchmarks of 5+ seconds on the test data, whereas the live site has 50x 
the number of rows and I'm concerned about trying to go live with it.
Went back to the manual and realised that the table wasn't properly 
indexed. Makes a *big* difference!

Sorry to waste your time/bandwidth.

Kris

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Query Optimization

2004-03-07 Thread Volnei Galbino
Hi,
 
Does anybody know where I can find information about query
optimization in MySQL? Of which the techniques that are used?
 
Regards,
 
Volnei Galbino
 


Re: Query Optimization

2004-03-07 Thread Paul DuBois
At 21:55 -0300 3/7/04, Volnei Galbino wrote:
Hi,

Does anybody know where I can find information about query
optimization in MySQL? Of which the techniques that are used?
Regards,

Volnei Galbino
Yes, there's a chapter on optimization in the MySQL Reference Manual.

http://www.mysql.com/doc/en/MySQL_Optimisation.html

--
Paul DuBois, MySQL Documentation Team
Madison, Wisconsin, USA
MySQL AB, www.mysql.com
MySQL Users Conference: April 14-16, 2004
http://www.mysql.com/uc2004/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Query optimization help

2004-02-26 Thread Sasha Pachev
Chuck Gadd wrote:
I've got a query that I can't seem to get optimized, so I'm
hoping someone here can spot something I've missing!
Table has three columns:
CoordID int unsigned,
Zip_Lo char(9),
Zip_Hi char(9)
Table has 3 million records

indexes:
acg_lo (Zip_Lo)
acg_hi (Zip_Hi)
acg_combined (Zip_Lo, Zip_Hi)


Here's the query:

select * from acg
where zip4_lo_pot = '80128' and
  zip4_hi_pot = '80128'


Explain shows:

type: ALL
possible keys: acg_lo,acg_hi,acg_combined
rows: 3022309
extra: Using where
This kind of query cannot be efficiently optimized on a pre-4.1 version. With 
4.1, if you are using MyISAM tables you could make (zip4_lo_pot,zip4_hi_pot) a 
spatial column with a spatial index. See 
http://www.mysql.com/doc/en/Spatial_extensions_in_MySQL.html

--
Sasha Pachev
Create online surveys at http://www.surveyz.com/
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


  1   2   >