Karl Wright created CONNECTORS-584:
--------------------------------------

             Summary: Crawling with MySQL does not use indexes for order-by on 
critical queries
                 Key: CONNECTORS-584
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-584
             Project: ManifoldCF
          Issue Type: Bug
          Components: Framework core
    Affects Versions: ManifoldCF 1.0.1
            Reporter: Karl Wright
            Assignee: Karl Wright
             Fix For: ManifoldCF 1.1


The following ORDER-BY query is taking a long time on MySQL:

{code}
--------------------------------------------------------
# Time: 121204 16:25:40
# User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
# Query_time: 7.240532  Lock_time: 0.000204 Rows_sent: 1200  Rows_examined: 
611091
SET timestamp=1354605940;
SELECT 
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
 FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND 
t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status 
IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' 
FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN 
('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' 
FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) 
ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC 
LIMIT 1200;
# Time: 121204 16:25:44
# User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1]
# Query_time: 3.064339  Lock_time: 0.000084 Rows_sent: 1  Rows_examined: 406359
SET timestamp=1354605944;
SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN 
('P','G') AND checkaction='R' AND checktime<=1354605932817 AND EXISTS(SELECT 
'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid)  ORDER BY 
docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1;
-------------------------------------------------------
{code}

I wonder if the queries appropriately use index of the table. 
As a result of EXPLAIN against the slow query, there was filesort.
There seems to be some conditions that MySQL does not use index depending on 
ORDER BY:
 - Executing ORDER BY against multiple keys
 - When keys selected from records are different from keys used by ORDER BY

Since filesort was happening, fully scanning records should be having MCF 
slower.

Do you think this could happen even in PostgreSQL or HSQLDB?
Do you think queries could be modified to use index appropriately?




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to