[sqlite] proposed changes for group review

Dennis Cote Thu, 12 Feb 2004 16:39:59 -0800

Hi All,

I am looking at making some changes to SQLite to correct the behavior of
the last_insert_rowid() function within triggers. While investigating
these changes I came upon some points where there are several different
ways to handle the semantics. I discussed these points briefly with
Richard Hipp, and he suggested these issues should be vetted on the
mailing list before any changes are implemented. So here goes...


Ticket #290 (http://www.sqlite.org/cvstrac/tktview?tn=290) originally
reported a problem with last_insert_rowid() and triggers. This issue was
fixed by blocking all updates to the last_insert_rowid variable within a
trigger. This fix created another problem reported in ticket #481
(http://www.sqlite.org/cvstrac/tktview?tn=481) and again on ticket #576
(http://www.sqlite.org/cvstrac/tktview?tn=576). This issue is what I am
trying to fix.

My proposed fix would establish a context for each trigger program to
execute in. This context would contain its own private copy of the
last_insert_rowid variable. INSERT statements within the trigger program
would update the local variable. That value would not affect the value
seen in, or returned by, the calling context which could be a top level
SQL statement or another trigger. The local copy would be initialized
with the value from the calling context, and then updated as INSERT
statements execute within the trigger program. At the end of the trigger
program the value of the last_insert_rowid variable will be restored to
its original value from the calling context.

While preparing the required changes I noticed that SQLite currently
updates the last_insert_rowid variable on UPDATE statements as well. I
think this behavior is wrong and should be corrected as well.

I have also looked into the values reported by the sqlite_changes() API
function since it is currently updated at the same point in the SQLite
code. Currently every INSERT or UPDATE changes the last_rowid and
increments the change count. DELETES only increment the change count.
The change count value is reset under certain conditions, and then
incremented for every insert, update, or delete performed on a table.
The sqlite_changes() API returns the value of this variable.

The change count is not currently available from SQL code, which means
it can't be used in a trigger. I propose adding a
last_statement_changed() function that would return the number of rows
changed by the last SQL statement. The value returned by this function
would be updated at the end of every INSERT, UPDATE, or DELETE
statement. SELECT statements would not change the value returned by this
function. Calling this function in a SELECT or WHERE clause of an
INSERT, UPDATE, or DELETE statement would return the value set at the
end of the previously completed statement, not the number of rows
changed so far by the current statement. A new temporary variable would
be cleared at the beginning of each INSERT, UPDATE, and DELETE
statement, modified as the statement executed, and then copied to the
change count variable at the end of the statement.

Currently, the change count is only reset at the beginning of every
sqlite_exec() call or before an sqlite_compile() call if there are no
unfinalized VMs associated with the database. Under normal
circumstances, with a single SQL statement being executed or compiled
(and the VM being immediately finalized) this variable holds the number
of changes resulting from that statement. However, if a single call to
sqlite_exec() passes an entire script (i.e. multiple SQL statements)
then the variable reports the total number of rows changed by all the
commands in the script.

It seems awkward to leave this variable alone for backward
compatibility, and also add a new variable that does almost the same
thing to provide the correct semantics for use in triggers (and other
SQL statements in scripts). I would propose changing SQLite to reset
this change count at the beginning of each INSERT, UPDATE, or DELETE
statement. Before doing so, I would like to know if anyone relies on the
current behavior of the sqlite_changes() when executing a script of
statements or multiple VMs.

The change count variable would be saved, updated, and restored using a
private context for trigger programs in the same way as the
last_insert_rowid variable. This would allow trigger programs to
determine the effects of previous statements within the trigger, and yet
not have the trigger affect the value returned after the main statement
completed.

For both of these functions there are special considerations required
for views. Inserts, updates, and deletes on views must be implemented as
INSTEAD OF triggers, so the question arises; what values should the
functions last_insert_rowid() and last_statement_changes() return after
an INSERT, UPDATE, or DELETE statement on a view?

Since views do not have rowids (try it, they are always NULL), I think
that last_insert_rowid() should either remain unchanged, or return NULL.
The first behavior will occur automatically as a result of restoring the
original value after the trigger program completes. The second behavior
would require additional changes to SQLite to let the last_insert_rowid
variable be set to a null value, and a new vdbe opcode to change its
value to null after an inert into a view. I don't think the additional
functionality is worth the additional code, but maybe someone else has a
good reason to implement it (or perhaps an even better idea).

Since a common use for views is to make several joined tables look like
one table, users may want want to export rowids from one or more of the
tables modified by the trigger program for use after the insert into a
view. This can be done by updating tables with the values returned by
last_insert_rowid() within the trigger. See my example under ticket
#576.

Similar arguments apply to updating the change count variable based on
changes to views. I think the default behavior of restoring the value
after the trigger executes (i.e. not reporting the changes that occur
inside the trigger program) is sufficient. INSERTS, UPDATES, and DELETES
on views would always return zero changes. If the user wants to report
the number of changes made by some of the statements within the INSTEAD
OF trigger program, they can update another table using the value(s)
returned by one or more calls to last_statement_changed(). This way they
can export exactly the required value from the trigger code.

This explicit export of values from within the view triggers avoids
problems that arise because of the number of statements, or the order of
execution of statements, within the trigger program. They don't need to
worry about double or triple changes being reported because the view is
based on two or three joined tables. They also don't need to know which
order inserts are applied to the tables that a view is based on.

The only other approach I can see would involve substantially more
significant changes to SQLite. If we added some additional syntax to the
INSTEAD OF triggers, we could create a temporary table with a single
record that the trigger program could update. This table could be called
EXPORT and it would have two columns, ROWID and CHANGES. This temporary
table would function within the trigger program much like the NEW and
OLD entries do now. The principal difference being that they would be
writable so that updates could be applied to them. Within the INSTEAD OF
trigger code, the user could update the ROWID and/or CHANGES field of
the EXPORT table using normal UPDATE statements. This would typically be
done by calling last_insert_rowid() or last_statement_changes() in an
UPDATE statement at the appropriate point in the trigger program. At the
end of the trigger program the values in these temporary tables would be
used to update the last_insert_rowid and change_count variables for the
calling context, rather than restoring the original values.

I believe this change would allow users to make views behave exactly
like normal tables as far as these status functions are concerned. But,
since this idea will require far more involved changes to SQLite, I will
leave it to someone else to implement (or at least leave it for another
day). :-)

If anyone has any further ideas, question, concerns, or suggestions,
please let me know. In particular, I am worried that some users may be
dependent on the current behavior of the sqlite_changes() API function,
or the fact that UPDATES change the last_insert_rowid variable, which I
am proposing to change.

Unless there is a strong objection, or a better idea, I will make these
changes and submit them to Richard for inclusion in SQLite.



For guidance on how other database systems handle these issues I
reviewed the standard and several other systems' documentation. They are
all different, and different from SQLite, so there is no de-facto
standard interface to follow. The handling of these features by the
other databases is summarized (its a long summary) below.

The SQL:1999 standard doesn't say anything about returning a rowid for
an INSERT (because the standard does not support automatically defined
or incremented field values) or the number of rows affected by an UPDATE
or DELETE. These features are all vendor specific extensions to SQL.

mySQL has an SQL function similar to last_insert_rowid(). This is what
it does (from the mySQL manual).

    LAST_INSERT_ID([expr]) Returns the last automatically generated
    value that was inserted into an AUTO_INCREMENT column. mysql> SELECT
    LAST_INSERT_ID(); -> 195 The last ID that was generated is
    maintained in the server on a per-connection basis. This means the
    value the function returns to a given client is the most recent
    AUTO_INCREMENT value generated by that client. The value cannot be
    affected by other clients, even if they generate AUTO_INCREMENT
    values of their own. This behavior ensures that you can retrieve
    your own ID without concern for the activity of other clients, and
    without the need for locks or transactions. The value of
    LAST_INSERT_ID() is not changed if you update the AUTO_INCREMENT
    column of a row with a non-magic value (that is, a value that is not
    NULL and not 0). If you insert many rows at the same time with an
    insert statement, LAST_INSERT_ID() returns the value for the first
    inserted row. The reason for this is to make it possible to easily
    reproduce the same INSERT statement against some other server. If
    expr is given as an argument to LAST_INSERT_ID(), then the value of
    the argument is returned by the function, and is set as the next
    value to be returned by LAST_INSERT_ID(). This can be used to
    simulate sequences: Create a table to hold the sequence counter and
    initialize it: mysql> CREATE TABLE sequence (id INT NOT NULL);
    mysql> INSERT INTO sequence VALUES (0); Use the table to generate
    sequence numbers like this: mysql> UPDATE sequence SET
    id=LAST_INSERT_ID(id+1); mysql> SELECT LAST_INSERT_ID(); The UPDATE
    statement increments the sequence counter and causes the next call
    to LAST_INSERT_ID() to return the updated value. The SELECT
    statement retrieves that value. The C API function mysql_insert_id()
    can also be used to get the value. See section 19.1.3.32
    mysql_insert_id(). You can generate sequences without calling
    LAST_INSERT_ID(), but the utility of using the function this way is
    that the ID value is maintained in the server as the last
    automatically generated value. It is multi-user safe because
    multiple clients can issue the UPDATE statement and get their own
    sequence value with the SELECT statement (or mysql_insert_id()),
    without affecting or being affected by other clients that generate
    their own sequence values. Note that mysql_insert_id() is only
    updated after INSERT and UPDATE statements, so you can't use the C
    API function to retrieve the value for LAST_INSERT_ID(expr) after
    executing other SQL statements like SELECT or SET.

mySQL reports the number or rows updated by a UPDATE statement directly
as the result of the command on the command line, and it provide the
same information using the mysql_info() API function as described in the
documentation below. There is no SQL function defined to return the
number of rows changed. If you set a column to the value it currently
has, MySQL notices this and doesn't update it. UPDATE returns the number
of rows that were actually changed. In MySQL Version 3.22 or later, the
C API function mysql_info() returns the number of rows that were matched
and updated and the number of warnings that occurred during the UPDATE.
If you update a column that has been declared NOT NULL by setting to
NULL, the column is set to the default value appropriate for the column
type and the warning count is incremented. The default value is is 0 for
numeric types, the empty string ('') for string types, and the ``zero''
value for date and time types. PostGreSQL does things differently. They
return a result from each SQL command, in particular INSERT, DELETE, and
UPDATE commands. Again from the PostGreSQL documentation;

    INSERT Command: On successful completion, an INSERT command returns
    a command tag of the form

    INSERT oid count The count is the number of rows inserted. If count
    is exactly one, and the target table has OIDs, then oid is the OID
    assigned to the inserted row. Otherwise oid is zero.

    DELETE command: On successful completion, a DELETE command returns a
    command tag of the form DELETE count The count is the number of rows
    deleted. If count is 0, no rows matched the condition (this is not
    considered an error).

    UPDATE command:

    On successful completion, an UPDATE command returns a command tag of
    the form UPDATE count The count is the number of rows updated. If
    count is 0, no rows matched the condition (this is not considered an
    error).

These command tags can be retrieved using an API function PQCmdStatus.
There does not appear to be any SQL function that can retrieve these
values.

    These functions are used to extract information from PGresult
    objects that are not SELECT results. PQcmdStatus Returns the command
    status tag from the SQL command that generated the PGresult. char *
    PQcmdStatus(PGresult *res); Commonly this is just the name of the
    command, but it may include additional data such as the number of
    rows processed.

PosGreSQL does not use auto-increment fields either. They use sequence
values, like Oracle. They are described below:

    This section describes PostgreSQL's functions for operating on
    sequence objects. Sequence objects (also called sequence generators
    or just sequences) are special single-row tables created with CREATE
    SEQUENCE. A sequence object is usually used to generate unique
    identifiers for rows of a table. The sequence functions, listed in
    Table 9-34, provide simple, multiuser-safe methods for obtaining
    successive sequence values from sequence objects

    The available sequence functions are: nextval Advance the sequence
    object to its next value and return that value. This is done
    atomically: even if multiple sessions execute nextval concurrently,
    each will safely receive a distinct sequence value. currval Return
    the value most recently obtained by nextval for this sequence in the
    current session. (An error is reported if nextval has never been
    called for this sequence in this session.) Notice that because this
    is returning a session-local value, it gives a predictable answer
    even if other sessions are executing nextval meanwhile. setval Reset
    the sequence object's counter value. The two-parameter form sets the
    sequence's last_value field to the specified value and sets its
    is_called field to true, meaning that the next nextval will advance
    the sequence before returning a value. In the three-parameter form,
    is_called may be set either true or false. If it's set to false, the
    next nextval will return exactly the specified value, and sequence
    advancement commences with the following nextval. For example,
    SELECT setval('foo', 42); Next nextval will return 43 SELECT
    setval('foo', 42, true); Same as above SELECT setval('foo', 42,
    false); Next nextval will return 42 The result returned by setval is
    just the value of its second argument. Important: To avoid blocking
    of concurrent transactions that obtain numbers from the same
    sequence, a nextval operation is never rolled back; that is, once a
    value has been fetched it is considered used, even if the
    transaction that did the nextval later aborts. This means that
    aborted transactions may leave unused "holes" in the sequence of
    assigned values. setval operations are never rolled back, either. If
    a sequence object has been created with default parameters, nextval
    calls on it will return successive values beginning with 1. Other
    behaviors can be obtained by using special parameters in the CREATE
    SEQUENCE command; see its command reference page for more
    information. Oracle does not use auto-increment fields either. They
    have things called sequences which are used to generate unique
    values for table keys. You can retrieve the current value or the
    next value from a sequence. There are also many restrictions on
    where these items can appear in the SQL code. The basic idea can be
    gleaned from the following examples from their documentation.
    Inserting sequence values into a table: Example

    This example increments the employee sequence and uses its value for
    a new employee inserted into the sample table hr.employees: INSERT
    INTO employees VALUES (employees_seq.nextval, 'John', 'Doe', 'jdoe',
    '555-1212', TO_DATE(SYSDATE), 'PU_CLERK', 2500, null, null, 30);
    Reusing the current value of a sequence: Example This example adds a
    new order with the next order number to the master order table. It
    then adds suborders with this number to the detail order table:
    INSERT INTO orders (order_id, order_date, customer_id) VALUES
    (orders_seq.nextval, TO_DATE(SYSDATE), 106); INSERT INTO order_items
    (order_id, line_item_id, product_id) VALUES (orders_seq.currval, 1,
    2359); INSERT INTO order_items (order_id, line_item_id, product_id)
    VALUES (orders_seq.currval, 2, 3290); INSERT INTO order_items
    (order_id, line_item_id, product_id) VALUES (orders_seq.currval, 3,
    2381);

As best I can tell from a relatively quick look at the Oracle
documentation, they do not provide a mechanism to determine the number
of rows changed by an UPDATE, DELETE, or INSERT statement. If anyone
knows that they do, please tell me how it is done.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[sqlite] proposed changes for group review

Reply via email to