Karl Wright created CONNECTORS-584: -------------------------------------- Summary: Crawling with MySQL does not use indexes for order-by on critical queries Key: CONNECTORS-584 URL: https://issues.apache.org/jira/browse/CONNECTORS-584 Project: ManifoldCF Issue Type: Bug Components: Framework core Affects Versions: ManifoldCF 1.0.1 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 1.1
The following ORDER-BY query is taking a long time on MySQL: {code} -------------------------------------------------------- # Time: 121204 16:25:40 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 7.240532 Lock_time: 0.000204 Rows_sent: 1200 Rows_examined: 611091 SET timestamp=1354605940; SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND t0.checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid AND t1.priority=5) AND NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) ORDER BY t0.docpriority ASC,t0.status ASC,t0.checkaction ASC,t0.checktime ASC LIMIT 1200; # Time: 121204 16:25:44 # User@Host: manifoldcf[manifoldcf] @ localhost [127.0.0.1] # Query_time: 3.064339 Lock_time: 0.000084 Rows_sent: 1 Rows_examined: 406359 SET timestamp=1354605944; SELECT docpriority,jobid,dochash,docid FROM jobqueue t0 WHERE status IN ('P','G') AND checkaction='R' AND checktime<=1354605932817 AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND t1.id=t0.jobid) ORDER BY docpriority ASC,status ASC,checkaction ASC,checktime ASC LIMIT 1; ------------------------------------------------------- {code} I wonder if the queries appropriately use index of the table. As a result of EXPLAIN against the slow query, there was filesort. There seems to be some conditions that MySQL does not use index depending on ORDER BY: - Executing ORDER BY against multiple keys - When keys selected from records are different from keys used by ORDER BY Since filesort was happening, fully scanning records should be having MCF slower. Do you think this could happen even in PostgreSQL or HSQLDB? Do you think queries could be modified to use index appropriately? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira