Dennis Cote wrote:



The fast query does a table scan of the Timesheet table looking for records that are between the specified dates. For each such record, it looks up the client, project, and employee records by matching the timesheet fields to the primary key of those tables. These extended records are collected and sorted.

Assuming 10% of the timesheet records match the date range, you have one scan of the 4000 record table, and 400 direct record lookups in each of the other three tables. Then a sort of 400 records.

This query would execute even faster if there was an index on the date field of the timesheet table. Then it could go directly to the first record that matches the date range and scan only the 400 records until the first date that is outside the date range.

The slow query on the other hand is not very good. It does cross join of the project and employee tables to produce 15 x 200 = 3000 records with every combination of project and employee. Then for each of those records it does a lookup of the client records that corresponds to the project using the primary key of the client table. Now for each of these 3000 extended records, it scans the 4000 record timesheet table looking for records that match the employee id, client id, and project id. That is 3000 scans of 4000 records, or 12,000,000 sets of comparisons. Each one that matches produces an extended record with all the data from the four tables. Finally, checks if the date of the extended record is between the specified dates. This query does not sort the final results. It should locate the same number of records as the first query, but will be much slower since it starts at the wrong end.


Thank you for the very thorough explanation. Indeed, adding an index on timesheet.date makes the slow query quite fast. The worst case scenario, in which the date range is wide enough that all records are selected, is unaffected, which is fine.

When I moved this change to the server which is actually going to be running this (which was Sqlite 3.1.x) the query was still slow even with the index available. However, upgrading it to 3.3.3 and ANALYZE'ing it fixed the problem there, as well.

BTW, the "slow" query is not supposed to sort the final results, since it works better with SQLObject if I sort the result in Python.

At some point I will be switching to SQLAlchemy as my ORM and will have much better flexibility, I think.

Thanks again,

Steve

Reply via email to