Hi all (especially Richard Hipp and his anonymous co-developers),

I'm making this (very long) post to open up discussions about a couple of
new features that I think would improve SQLite. Sadly, this new feature
request does not include any patches to implement these features (which I
understand always go a long way to getting the feature addition approved).

The following is a typical SQL query that does a fairly complicated
conditional lookup using several tables. It will serve as an example for the
discussion of the proposed new features.

--query to get the value of a named property for a master
--  assuming that the value may be inherited from a master template

SELECT
    CASE
        WHEN
            --get named property from master
            (
            SELECT device_property_value FROM Device_Property_Value
            WHERE device_id = ? --:master_id
            AND device_property_id =
                (
                SELECT device_property_id FROM Device_Property
                WHERE device_kind_id =
                    (
                    SELECT device_kind_id FROM Device
                    WHERE device_id = ? --:master_id
                    )
                AND device_property_name = ? --:property_name
                )
            )
            --if it is undefined
            ISNULL
            THEN
                --get the named property from the master's master
                --  template
                (
                SELECT device_property_value FROM Device_Property_Value
                WHERE device_id =
                    --lookup master template device id
                    (
                    SELECT device_property_value
                    FROM Device_Property_Value
                    WHERE device_id = ? --:master_id
                    AND device_property_id =
                        (
                        SELECT device_property_id FROM Device_Property
                        WHERE device_kind_id =
                            (
                            SELECT device_kind_id FROM Device
                            WHERE device_id = ? --:master_id
                            )
                        AND device_property_name = 'Master Template'
                        )
                    )
                AND device_property_id =
                    --lookup property id of the master template property
                    --  with the same name
                    (
                    SELECT device_property_id FROM Device_Property
                    WHERE device_kind_id =
                        (
                        SELECT device_kind_id FROM Device_Kind
                        WHERE device_kind_name = 'Master Template'
                        )
                    AND device_property_name = ? --:property_name
                    )
                )
        ELSE
            --the master's local property is defined so return that
            (
            SELECT device_property_value FROM Device_Property_Value
            WHERE device_id = ? --:master_id
            AND device_property_id =
                (
                SELECT device_property_id FROM Device_Property
                WHERE device_kind_id =
                    (
                    SELECT device_kind_id FROM Device
                    WHERE device_id = ? --:master_id
                    )
                AND device_property_name = ? --:property_name
                )
            )
        END



This query uses the following table definitions.

CREATE TABLE Device (
    device_id               INTEGER PRIMARY KEY,
    device_kind_id          INTEGER NOT NULL REFERENCES Device_Kind,
    device_number           INTEGER(5) NOT NULL,
    device_name             VARCHAR(20),
    device_info             VARCHAR,
    UNIQUE (device_kind_id, device_number)
);

CREATE TABLE Device_Kind (
    device_kind_id          INTEGER PRIMARY KEY,
    device_kind_name        VARCHAR UNIQUE NOT NULL
);

CREATE TABLE Device_Property (
    device_property_id      INTEGER PRIMARY KEY,
    device_kind_id          INTEGER NOT NULL REFERENCES Device_Kind,
    device_property_name    VARCHAR NOT NULL,
    device_property_type    VARCHAR DEFAULT 'INTEGER',
    device_property_units   VARCHAR DEFAULT NULL,
    device_property_DEFAULT VARCHAR DEFAULT NULL,
    UNIQUE (device_kind_id, device_property_name)
);

CREATE TABLE Device_Property_Value (
    device_id               INTEGER REFERENCES Device,
    device_property_id      INTEGER REFERENCES Device_Property,
    device_property_value   VARCHAR,
    PRIMARY KEY (device_id, device_property_id)
);


This query exposes a couple of minor problems I see with SQLite.

The first issue concerns the parameters, the ? characters in the query. I'm
using the experimental sqlite_bind API to bind parameters to general SQL
queries. The current API uses positional parameters. Therefore I need to
make an sqlite_bind call for each parameter, or 9 calls in the case of this
example query. This is unfortunate since I am really only using two
parameter values, the master_id and property_name, shown in the comments
after each parameter. For this query I need to bind the master_id value to 6
positional parameters, and the property_name value to 3 positional
parameters using sqlite_bind calls.

I find that this type of thing occurs quite often, where the same parameter
value is used multiple time in a single query.

***** First Feature Request *****

I would like to propose that sqlite_bind be enhanced to support named
parameters rather than positional parameters. I would suggest that SQLite
should adopt the colon prefixed names defined in the SQL:1999 standard for
parameters.

Named parameters would allow this query to be completed with only 2 calls to
sqlite_bind, one for each of my actual parameter values. Named parameters
result in much more readable SQL code without the need for comments
associated with each parameter (or without the need to rely on memeory).
Named parameters are also resistant to problems created by editing the base
SQL query and moving the relative positions of the parameters.

I beleive that SQLite could discover all the named parameters in a query as
it is parsed, much as I suspect it does now with the ? positional
parameters. The names of the parameters would be saved in an array in the VM
that is compiled as they are discovered. The index of the parameter could
then be used to reference it in the VDBE code that is generated. Any
repeated parameter names would simply use the index of the previously
discovered parameter name. The sqlite_bind call should be changed to take a
parameter name and the parameter value as arguments. It would then save the
supplied parameter value in the parameter array. The values of the named
parameters can then be accessed from the parameter array by the VM when the
query executes.

The improved query below demonstrates the use of the proposed named
parameters. Each named parameter consists of a colon character followed by a
user defined name. In this csae I'm using :master_id and :property_name as
parameters to the query.


--query to get the value of a named property for a master
--  assuming that the value may be inherited from its master template

WITH
    master_kind_query (master_kind_id) AS
        --get device_kind_id of master
        (
        SELECT device_kind_id FROM Device
        WHERE device_id = :master_id
        ),
    master_query (master_property_value) AS
        --get named property from master
        (
        SELECT device_property_value FROM Device_Property_Value
        WHERE device_id = :master_id
        AND device_property_id =
            (
            SELECT device_property_id FROM Device_Property
            WHERE device_kind_id = (SELECT master_kind_id FROM
                master_kind_query)
            AND device_property_name = :property_name
            )
        )
SELECT
    CASE
        WHEN (SELECT master_property_value FROM master_query) ISNULL
            THEN
                --get the named property from the master's master
                --  template
                (
                SELECT device_property_value FROM Device_Property_Value
                WHERE device_id =
                    --lookup master template device id
                    (
                    SELECT device_property_value
                    FROM Device_Property_Value
                    WHERE device_id = :master_id
                    AND device_property_id =
                        (
                        SELECT device_property_id FROM Device_Property
                        WHERE device_kind_id = (SELECT master_kind_id
                            FROM master_kind_query)
                        AND device_property_name = 'Master Template'
                        )
                    )
                AND device_property_id =
                    --lookup property id of the master template property
                    --  with the same name
                    (
                    SELECT device_property_id FROM Device_Property
                    WHERE device_kind_id =
                        (
                        SELECT device_kind_id FROM Device_Kind
                        WHERE device_kind_name = 'Master Template'
                        )
                    AND device_property_name = :property_name
                    )
                )
        ELSE
            --the master's local property is defined so return that
            (SELECT master_property_value FROM master_query)
        END


***** Second Feature Request *****

This second example SQL query also demonstrates the use of the my second
proposed feature addition, the WITH caluse. A WITH clause is a part of the
new SQL:1999 standard's defnition of a SELECT statement. It allows named
subqueries to be defined for, and referenced within, a SELECT statement.

In the first example, the first sub-query is repeated twice, once after the
WHEN, and a second time, after the ELSE. This subquery is identical in both
locations. In a high power SQL server database system the query optimizer
would probably discover this repetition and factor that sub-query
automatically. I don't believe that SQLite does this kind of optimization.
As a result SQLite ends up repeating the same query twice, once to evaluate
the WHEN condition, and again to generate the returned result if the first
result (the same value) was not NULL.

The WITH clause in the new standard basically allows the user to perform
this optimization by hand. The user can factor any repeated sub-queries out
of the main query by moving them into the WITH clause. These sub-queries are
named in the WITH clause so they can be referenced in the main query.

The WITH clause contains a comma seperated list of sub-queries. Each of
these sub-queries is assigned a name and can optionally be followed by a
named column list. The named column list is important when the sub-query
results are the produce by functions such as COUNT() or MAX(), or complex
queries with joined tables. Each sub-query is followed by the SELECT
statement that implements the sub-query.

The sub-queries in the WITH clause can also reference any of the prior named
sub-queries.

In this example I first factored out the repeated sub-query that looks up
the master's property value (after the WHEN and after the ELSE in the first
example) and named it master_query. I then factored out the simple query
that looks up the master's device_kind_id, which was repeated three times in
the first example query, and named it master_kind_query.

Note that the master_query references the maste_kind_query, so that the
master_kind_query needs to be placed first in the WITH clause.

As a result, this query can be executed more quickly and with less
duplicated effort. It is also much easier to understand, at least it is for
me.

As to implementing the WITH clause, I beleive that SQLite could effectively
rewrite this SQL query as a series of queries that generate a couple of
temporary tables, perform the main query, and then drop the temporary
tables. Of course, my application could do the same thing but that would
require it to issue a series of seven SQL compile, bind, step, finalize API
calls, whereas a single series of calls would suffice if the WITH clause is
implemented internally in SQLite. This is so much extra work that, if there
is no support for the WITH clause, I would rather use the first example
query unless the performance cost of the duplicated sub-queries became an
issue.

This example query using a WITH clause could be re-written as the following
series of SQL statments.

CREATE TEMP TABLE master_kind_query (master_kind_id);
INSERT INTO master_kind_query
    SELECT device_kind_id FROM Device
    WHERE device_id = :master_id;
CREATE TEMP TABLE master_query (master_property_value);
INSERT INTO master_query
    SELECT device_property_value FROM Device_Property_Value
    WHERE device_id = :master_id
    AND device_property_id =
        (
        SELECT device_property_id FROM Device_Property
        WHERE device_kind_id = (SELECT master_kind_id
            FROM master_kind_query)
        AND device_property_name = :property_name
        );
SELECT
    CASE
        WHEN (SELECT master_property_value FROM master_query) ISNULL
            THEN
                --get the named property from the master's master
                --  template
                (
                SELECT device_property_value FROM Device_Property_Value
                WHERE device_id =
                    --lookup master template device id
                    (
                    SELECT device_property_value
                    FROM Device_Property_Value
                    WHERE device_id = :master_id
                    AND device_property_id =
                        (
                        SELECT device_property_id FROM Device_Property
                        WHERE device_kind_id = (SELECT master_kind_id
                            FROM master_kind_query)
                        AND device_property_name = 'Master Template'
                        )
                    )
                AND device_property_id =
                    --lookup property id of the master template property
                    --  with the same name
                    (
                    SELECT device_property_id FROM Device_Property
                    WHERE device_kind_id =
                        (
                        SELECT device_kind_id FROM Device_Kind
                        WHERE device_kind_name = 'Master Template'
                        )
                    AND device_property_name = :property_name
                    )
                )
        ELSE
            --the master's local property is defined so return that
            (SELECT master_property_value FROM master_query)
        END;
DROP TABLE master_query;
DROP TABLE master_kind_query;

If this rewriting was done internally to SQLite, and the resulting
statements were compiled to a single VM and executed all at once, the result
would be a faster implementation of the original query which required only a
single set of API calls to be executed.

I hope this explains my suggestions clearly enough for everyone to
understand. If not, let the questions fly, and I will try to answer anything
I can.

Dennis Cote

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to