Steve Bergman wrote:

None of the tables have many records. Timesheet has about 4000. employee has about 15. Client has about 100. project has about 200.

The "Fast query, below, executes in about a second.

But I'm curious what is "wrong" with the "Slow query".

Schema and EXPLAIN results are also included.

SQLite version is 3.3.3.

===========================================
Fast query:

SELECT
* , timesheet.id as id, description as desc
FROM Timesheet, Client, Project, Employee
WHERE
(Timesheet.employee_id = 9 OR '1' = '1') AND
Timesheet.client_id = Client.id AND
Timesheet.project_id = Project.id AND
Timesheet.employee_id = Employee.id AND
Timesheet.date >= '2006-04-02' AND
Timesheet.date <= '2006-04-12' AND
(client.client_name like '%%' OR
project.project_desc like '%%' OR
employee.last_name like '%%' OR
employee.first_name like '%%' OR
timesheet.description like '%%' OR
timesheet.internal_note like '%%')
ORDER BY date
=========================================
Slow query:

SELECT
timesheet.id,
timesheet.date,
timesheet.time,
timesheet.description,
timesheet.internal_note,
timesheet.emp_bill_status,
timesheet.travel_time,
timesheet.regular_time,
timesheet.onsite_support,
timesheet.phone_support,
timesheet.remote_support,
timesheet.employee_id,
timesheet.client_id,
timesheet.project_id,
timesheet.inoffice_support
FROM
project, employee, client, timesheet WHERE
(
(timesheet.date >= '2006-04-01 00:00:00') AND
((timesheet.date <= '2006-04-11 00:00:00') AND
((employee.id = timesheet.employee_id) AND
((client.id = timesheet.client_id) AND
((project.id = timesheet.project_id) AND
((client.id = project.client_id) AND
((client.client_name LIKE '%%') OR
((employee.last_name LIKE '%%') OR
((timesheet.description LIKE '%%') OR
(project.project_desc LIKE '%%' )
)))))))));

Steve,

If I'm reading your queries correctly they can be simplified down to the following two equivalent queries.

Fast query:

   SELECT
   * , timesheet.id as id, description as desc
   FROM Timesheet
   JOIN Client on Timesheet.client_id = Client.id
   JOIN Project on Timesheet.project_id = Project.id
   JOIN Employee on Timesheet.employee_id = Employee.id
   WHERE Timesheet.date >= '2006-04-02'
   AND Timesheet.date <= '2006-04-12'
   ORDER BY date;

Slow query:

   SELECT
   timesheet.id,
   timesheet.date,
   timesheet.time,
   timesheet.description,
   timesheet.internal_note,
   timesheet.emp_bill_status,
   timesheet.travel_time,
   timesheet.regular_time,
   timesheet.onsite_support,
   timesheet.phone_support,
   timesheet.remote_support,
   timesheet.employee_id,
   timesheet.client_id,
   timesheet.project_id,
   timesheet.inoffice_support
   FROM project
   CROSS JOIN employee
   JOIN client on client.id = project.client_id
   JOIN timesheet on (
       employee.id = timesheet.employee_id AND
       client.id = timesheet.client_id AND
       project.id = timesheet.project_id
       )
   WHERE timesheet.date >= '2006-04-01 00:00:00'
   AND timesheet.date <= '2006-04-11 00:00:00';

The fast query does a table scan of the Timesheet table looking for records that are between the specified dates. For each such record, it looks up the client, project, and employee records by matching the timesheet fields to the primary key of those tables. These extended records are collected and sorted.

Assuming 10% of the timesheet records match the date range, you have one scan of the 4000 record table, and 400 direct record lookups in each of the other three tables. Then a sort of 400 records.

This query would execute even faster if there was an index on the date field of the timesheet table. Then it could go directly to the first record that matches the date range and scan only the 400 records until the first date that is outside the date range.

The slow query on the other hand is not very good. It does cross join of the project and employee tables to produce 15 x 200 = 3000 records with every combination of project and employee. Then for each of those records it does a lookup of the client records that corresponds to the project using the primary key of the client table. Now for each of these 3000 extended records, it scans the 4000 record timesheet table looking for records that match the employee id, client id, and project id. That is 3000 scans of 4000 records, or 12,000,000 sets of comparisons. Each one that matches produces an extended record with all the data from the four tables. Finally, checks if the date of the extended record is between the specified dates. This query does not sort the final results. It should locate the same number of records as the first query, but will be much slower since it starts at the wrong end.

HTH
Dennis Cote

Reply via email to