Steve Bergman wrote:
None of the tables have many records. Timesheet has about 4000.
employee has about 15. Client has about 100. project has about 200.
The "Fast query, below, executes in about a second.
But I'm curious what is "wrong" with the "Slow query".
Schema and EXPLAIN results are also included.
SQLite version is 3.3.3.
===========================================
Fast query:
SELECT
* , timesheet.id as id, description as desc
FROM Timesheet, Client, Project, Employee
WHERE
(Timesheet.employee_id = 9 OR '1' = '1') AND
Timesheet.client_id = Client.id AND
Timesheet.project_id = Project.id AND
Timesheet.employee_id = Employee.id AND
Timesheet.date >= '2006-04-02' AND
Timesheet.date <= '2006-04-12' AND
(client.client_name like '%%' OR
project.project_desc like '%%' OR
employee.last_name like '%%' OR
employee.first_name like '%%' OR
timesheet.description like '%%' OR
timesheet.internal_note like '%%')
ORDER BY date
=========================================
Slow query:
SELECT
timesheet.id,
timesheet.date,
timesheet.time,
timesheet.description,
timesheet.internal_note,
timesheet.emp_bill_status,
timesheet.travel_time,
timesheet.regular_time,
timesheet.onsite_support,
timesheet.phone_support,
timesheet.remote_support,
timesheet.employee_id,
timesheet.client_id,
timesheet.project_id,
timesheet.inoffice_support
FROM
project, employee, client, timesheet WHERE
(
(timesheet.date >= '2006-04-01 00:00:00') AND
((timesheet.date <= '2006-04-11 00:00:00') AND
((employee.id = timesheet.employee_id) AND
((client.id = timesheet.client_id) AND
((project.id = timesheet.project_id) AND
((client.id = project.client_id) AND
((client.client_name LIKE '%%') OR
((employee.last_name LIKE '%%') OR
((timesheet.description LIKE '%%') OR
(project.project_desc LIKE '%%' )
)))))))));
Steve,
If I'm reading your queries correctly they can be simplified down to the
following two equivalent queries.
Fast query:
SELECT
* , timesheet.id as id, description as desc
FROM Timesheet
JOIN Client on Timesheet.client_id = Client.id
JOIN Project on Timesheet.project_id = Project.id
JOIN Employee on Timesheet.employee_id = Employee.id
WHERE Timesheet.date >= '2006-04-02'
AND Timesheet.date <= '2006-04-12'
ORDER BY date;
Slow query:
SELECT
timesheet.id,
timesheet.date,
timesheet.time,
timesheet.description,
timesheet.internal_note,
timesheet.emp_bill_status,
timesheet.travel_time,
timesheet.regular_time,
timesheet.onsite_support,
timesheet.phone_support,
timesheet.remote_support,
timesheet.employee_id,
timesheet.client_id,
timesheet.project_id,
timesheet.inoffice_support
FROM project
CROSS JOIN employee
JOIN client on client.id = project.client_id
JOIN timesheet on (
employee.id = timesheet.employee_id AND
client.id = timesheet.client_id AND
project.id = timesheet.project_id
)
WHERE timesheet.date >= '2006-04-01 00:00:00'
AND timesheet.date <= '2006-04-11 00:00:00';
The fast query does a table scan of the Timesheet table looking for
records that are between the specified dates. For each such record, it
looks up the client, project, and employee records by matching the
timesheet fields to the primary key of those tables. These extended
records are collected and sorted.
Assuming 10% of the timesheet records match the date range, you have one
scan of the 4000 record table, and 400 direct record lookups in each of
the other three tables. Then a sort of 400 records.
This query would execute even faster if there was an index on the date
field of the timesheet table. Then it could go directly to the first
record that matches the date range and scan only the 400 records until
the first date that is outside the date range.
The slow query on the other hand is not very good. It does cross join of
the project and employee tables to produce 15 x 200 = 3000 records with
every combination of project and employee. Then for each of those
records it does a lookup of the client records that corresponds to the
project using the primary key of the client table. Now for each of these
3000 extended records, it scans the 4000 record timesheet table looking
for records that match the employee id, client id, and project id. That
is 3000 scans of 4000 records, or 12,000,000 sets of comparisons. Each
one that matches produces an extended record with all the data from the
four tables. Finally, checks if the date of the extended record is
between the specified dates. This query does not sort the final results.
It should locate the same number of records as the first query, but will
be much slower since it starts at the wrong end.
HTH
Dennis Cote