Joining on non unique index is irrationally slow, I can beat join performance with stored procedure 100 fold!!!!

Bereczki Gabor Mon, 21 Feb 2005 22:03:32 -0800

Hi, 
I am working with 5.0.3 snapshot on x86_64. I am now experiencing with
features of MySQL, so please ignore that in my example I use self join .
I know that this particular example could be done much easier but
eventually I am going to join multiple tables with similar structure
(one preprocessing table and one final table).
My query is:


mysql> insert into log select t1.uniprot_primary_ac from
uniprot_attributes as t1 left join uniprot_attributes as t2 on
(t1.attribute_name=t2.attribute_name and
t1.attribute_value=t2.attribute_value) where
t2.uniprot_primary_ac="P15017";

Query OK, 748 rows affected (16.31 sec)
Records: 748  Duplicates: 0  Warnings: 0


mysql> explain select t1.uniprot_primary_ac from uniprot_attributes as
t1 left join uniprot_attributes as t2 on
(t1.attribute_name=t2.attribute_name and
t1.attribute_value=t2.attribute_value) where
t2.uniprot_primary_ac="P15017";     -> ?
+----+-------------+-------+------
+----------------------------------------------+----------+---------
+------------------------+-------+-------------+
| id | select_type | table | type | possible_keys
| key      | key_len | ref          | rows  | Extra       |
+----+-------------+-------+------
+----------------------------------------------+----------+---------
+------------------------+-------+-------------+
|  1 | SIMPLE      | t2    | ref  |
up_att_1,up_att_3,up_att_4,up_att_5,up_att_2 | up_att_3 | 23      |
const          |     9 | Using where |
|  1 | SIMPLE      | t1    | ref  | up_att_4,up_att_5,up_att_2
| up_att_4 | 8       | test.t2.attribute_name | 35655 | Using where |
+----+-------------+-------+------
+----------------------------------------------+----------+---------
+------------------------+-------+-------------+

Why does it take 16 seconds to join 9 rows to 35655 rows on a x86_64
machine with 2GB RAM nearly all processor capacity available?
Why is it using where, when there are indexes for all possible
combinations?
I indexed the table almost all possible ways to find a way to optimize
the query but no help:
mysql> show index from uniprot_attributes;
    -> ?
+--------------------+------------+----------+--------------
+--------------------+-----------+-------------+----------+--------
+------+------------+---------+
| Table              | Non_unique | Key_name | Seq_in_index |
Column_name        | Collation | Cardinality | Sub_part | Packed | Null
| Index_type | Comment |
+--------------------+------------+----------+--------------
+--------------------+-----------+-------------+----------+--------
+------+------------+---------+
| uniprot_attributes |          1 | up_att_1 |            1 |
uniprot_primary_ac | A         |      170522 |     NULL | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_1 |            2 |
attribute_name     | A         |     1961014 |        4 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_3 |            1 |
uniprot_primary_ac | A         |      170522 |     NULL | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_3 |            2 |
attribute_name     | A         |     1961014 |        5 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_3 |            3 |
attribute_value    | A         |     3922028 |        7 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_4 |            1 |
attribute_name     | A         |         109 |        5 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_4 |            2 |
attribute_value    | A         |      980507 |        7 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_4 |            3 |
uniprot_primary_ac | A         |     3922028 |     NULL | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_5 |            1 |
attribute_value    | A         |      980507 |        7 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_5 |            2 |
attribute_name     | A         |      980507 |        5 | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_5 |            3 |
uniprot_primary_ac | A         |     3922028 |     NULL | NULL   | YES
| BTREE      |         |
| uniprot_attributes |          1 | up_att_2 |            1 |
attribute_value    | NULL      |           1 |     NULL | NULL   | YES
| FULLTEXT   |         |
+--------------------+------------+----------+--------------
+--------------------+-----------+-------------+----------+--------
+------+------------+---------+
12 rows in set (0.01 sec)


Moreover I created a stored procedure that does exactly the same as the
select statement above:
mysql> create procedure test_performance(IN uniprot_id VARCHAR(20) )
-> begin
    -> declare v_attribute_name VARCHAR(20);
    -> declare v_attribute_value TEXT;
    -> declare done int default 0;
    -> declare cur1 cursor for select attribute_name, attribute_value
from uniprot_attributes where uniprot_primary_ac=uniprot_id;
    -> declare continue handler for sqlstate '02000' set done=1;
    -> open cur1;
    -> mainloop: repeat
    -> fetch cur1 into v_attribute_name, v_attribute_value;
    -> if done=0 then
    ->     insert into log select distinct uniprot_primary_ac from
uniprot_attributes where attribute_name=v_attribute_name and
attribute_value=v_attribute_value;
    -> end if;
    -> until done end repeat mainloop;
    -> close cur1;
    -> end
    -> ?
Query OK, 0 rows affected (0.00 sec)

mysql> call test_performance("P15017");? 
Query OK, 1 row affected (0.13 sec)  

The stored procedure does exactly the same as the query but 120x
faster!!!!
How is it possible?
Is there a workaround to optimize the query? Please dont come with "use
not null columns, drop redundant indexes" because it doesnt address the
core of the problem. Why is not the select statement as efficient as the
stored procedure?
It is important because I dont want to rely on stored procedures too
much because I found there is a memory bottleneck problem with SPs ( see
"Are stored procedures merely code generators?" posting), if there are
too many action statements, the stored procedures rapidly uses up all
memory without doing anything.
Thanks,
-- 
Bereczki Gabor <[EMAIL PROTECTED]>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Joining on non unique index is irrationally slow, I can beat join performance with stored procedure 100 fold!!!!

Reply via email to