Hi All, I am looking at making some changes to SQLite to correct the behavior of the last_insert_rowid() function within triggers. While investigating these changes I came upon some points where there are several different ways to handle the semantics. I discussed these points briefly with Richard Hipp, and he suggested these issues should be vetted on the mailing list before any changes are implemented. So here goes...
Ticket #290 (http://www.sqlite.org/cvstrac/tktview?tn=290) originally reported a problem with last_insert_rowid() and triggers. This issue was fixed by blocking all updates to the last_insert_rowid variable within a trigger. This fix created another problem reported in ticket #481 (http://www.sqlite.org/cvstrac/tktview?tn=481) and again on ticket #576 (http://www.sqlite.org/cvstrac/tktview?tn=576). This issue is what I am trying to fix. My proposed fix would establish a context for each trigger program to execute in. This context would contain its own private copy of the last_insert_rowid variable. INSERT statements within the trigger program would update the local variable. That value would not affect the value seen in, or returned by, the calling context which could be a top level SQL statement or another trigger. The local copy would be initialized with the value from the calling context, and then updated as INSERT statements execute within the trigger program. At the end of the trigger program the value of the last_insert_rowid variable will be restored to its original value from the calling context. While preparing the required changes I noticed that SQLite currently updates the last_insert_rowid variable on UPDATE statements as well. I think this behavior is wrong and should be corrected as well. I have also looked into the values reported by the sqlite_changes() API function since it is currently updated at the same point in the SQLite code. Currently every INSERT or UPDATE changes the last_rowid and increments the change count. DELETES only increment the change count. The change count value is reset under certain conditions, and then incremented for every insert, update, or delete performed on a table. The sqlite_changes() API returns the value of this variable. The change count is not currently available from SQL code, which means it can't be used in a trigger. I propose adding a last_statement_changed() function that would return the number of rows changed by the last SQL statement. The value returned by this function would be updated at the end of every INSERT, UPDATE, or DELETE statement. SELECT statements would not change the value returned by this function. Calling this function in a SELECT or WHERE clause of an INSERT, UPDATE, or DELETE statement would return the value set at the end of the previously completed statement, not the number of rows changed so far by the current statement. A new temporary variable would be cleared at the beginning of each INSERT, UPDATE, and DELETE statement, modified as the statement executed, and then copied to the change count variable at the end of the statement. Currently, the change count is only reset at the beginning of every sqlite_exec() call or before an sqlite_compile() call if there are no unfinalized VMs associated with the database. Under normal circumstances, with a single SQL statement being executed or compiled (and the VM being immediately finalized) this variable holds the number of changes resulting from that statement. However, if a single call to sqlite_exec() passes an entire script (i.e. multiple SQL statements) then the variable reports the total number of rows changed by all the commands in the script. It seems awkward to leave this variable alone for backward compatibility, and also add a new variable that does almost the same thing to provide the correct semantics for use in triggers (and other SQL statements in scripts). I would propose changing SQLite to reset this change count at the beginning of each INSERT, UPDATE, or DELETE statement. Before doing so, I would like to know if anyone relies on the current behavior of the sqlite_changes() when executing a script of statements or multiple VMs. The change count variable would be saved, updated, and restored using a private context for trigger programs in the same way as the last_insert_rowid variable. This would allow trigger programs to determine the effects of previous statements within the trigger, and yet not have the trigger affect the value returned after the main statement completed. For both of these functions there are special considerations required for views. Inserts, updates, and deletes on views must be implemented as INSTEAD OF triggers, so the question arises; what values should the functions last_insert_rowid() and last_statement_changes() return after an INSERT, UPDATE, or DELETE statement on a view? Since views do not have rowids (try it, they are always NULL), I think that last_insert_rowid() should either remain unchanged, or return NULL. The first behavior will occur automatically as a result of restoring the original value after the trigger program completes. The second behavior would require additional changes to SQLite to let the last_insert_rowid variable be set to a null value, and a new vdbe opcode to change its value to null after an inert into a view. I don't think the additional functionality is worth the additional code, but maybe someone else has a good reason to implement it (or perhaps an even better idea). Since a common use for views is to make several joined tables look like one table, users may want want to export rowids from one or more of the tables modified by the trigger program for use after the insert into a view. This can be done by updating tables with the values returned by last_insert_rowid() within the trigger. See my example under ticket #576. Similar arguments apply to updating the change count variable based on changes to views. I think the default behavior of restoring the value after the trigger executes (i.e. not reporting the changes that occur inside the trigger program) is sufficient. INSERTS, UPDATES, and DELETES on views would always return zero changes. If the user wants to report the number of changes made by some of the statements within the INSTEAD OF trigger program, they can update another table using the value(s) returned by one or more calls to last_statement_changed(). This way they can export exactly the required value from the trigger code. This explicit export of values from within the view triggers avoids problems that arise because of the number of statements, or the order of execution of statements, within the trigger program. They don't need to worry about double or triple changes being reported because the view is based on two or three joined tables. They also don't need to know which order inserts are applied to the tables that a view is based on. The only other approach I can see would involve substantially more significant changes to SQLite. If we added some additional syntax to the INSTEAD OF triggers, we could create a temporary table with a single record that the trigger program could update. This table could be called EXPORT and it would have two columns, ROWID and CHANGES. This temporary table would function within the trigger program much like the NEW and OLD entries do now. The principal difference being that they would be writable so that updates could be applied to them. Within the INSTEAD OF trigger code, the user could update the ROWID and/or CHANGES field of the EXPORT table using normal UPDATE statements. This would typically be done by calling last_insert_rowid() or last_statement_changes() in an UPDATE statement at the appropriate point in the trigger program. At the end of the trigger program the values in these temporary tables would be used to update the last_insert_rowid and change_count variables for the calling context, rather than restoring the original values. I believe this change would allow users to make views behave exactly like normal tables as far as these status functions are concerned. But, since this idea will require far more involved changes to SQLite, I will leave it to someone else to implement (or at least leave it for another day). :-) If anyone has any further ideas, question, concerns, or suggestions, please let me know. In particular, I am worried that some users may be dependent on the current behavior of the sqlite_changes() API function, or the fact that UPDATES change the last_insert_rowid variable, which I am proposing to change. Unless there is a strong objection, or a better idea, I will make these changes and submit them to Richard for inclusion in SQLite. For guidance on how other database systems handle these issues I reviewed the standard and several other systems' documentation. They are all different, and different from SQLite, so there is no de-facto standard interface to follow. The handling of these features by the other databases is summarized (its a long summary) below. The SQL:1999 standard doesn't say anything about returning a rowid for an INSERT (because the standard does not support automatically defined or incremented field values) or the number of rows affected by an UPDATE or DELETE. These features are all vendor specific extensions to SQL. mySQL has an SQL function similar to last_insert_rowid(). This is what it does (from the mySQL manual). LAST_INSERT_ID([expr]) Returns the last automatically generated value that was inserted into an AUTO_INCREMENT column. mysql> SELECT LAST_INSERT_ID(); -> 195 The last ID that was generated is maintained in the server on a per-connection basis. This means the value the function returns to a given client is the most recent AUTO_INCREMENT value generated by that client. The value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that you can retrieve your own ID without concern for the activity of other clients, and without the need for locks or transactions. The value of LAST_INSERT_ID() is not changed if you update the AUTO_INCREMENT column of a row with a non-magic value (that is, a value that is not NULL and not 0). If you insert many rows at the same time with an insert statement, LAST_INSERT_ID() returns the value for the first inserted row. The reason for this is to make it possible to easily reproduce the same INSERT statement against some other server. If expr is given as an argument to LAST_INSERT_ID(), then the value of the argument is returned by the function, and is set as the next value to be returned by LAST_INSERT_ID(). This can be used to simulate sequences: Create a table to hold the sequence counter and initialize it: mysql> CREATE TABLE sequence (id INT NOT NULL); mysql> INSERT INTO sequence VALUES (0); Use the table to generate sequence numbers like this: mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1); mysql> SELECT LAST_INSERT_ID(); The UPDATE statement increments the sequence counter and causes the next call to LAST_INSERT_ID() to return the updated value. The SELECT statement retrieves that value. The C API function mysql_insert_id() can also be used to get the value. See section 19.1.3.32 mysql_insert_id(). You can generate sequences without calling LAST_INSERT_ID(), but the utility of using the function this way is that the ID value is maintained in the server as the last automatically generated value. It is multi-user safe because multiple clients can issue the UPDATE statement and get their own sequence value with the SELECT statement (or mysql_insert_id()), without affecting or being affected by other clients that generate their own sequence values. Note that mysql_insert_id() is only updated after INSERT and UPDATE statements, so you can't use the C API function to retrieve the value for LAST_INSERT_ID(expr) after executing other SQL statements like SELECT or SET. mySQL reports the number or rows updated by a UPDATE statement directly as the result of the command on the command line, and it provide the same information using the mysql_info() API function as described in the documentation below. There is no SQL function defined to return the number of rows changed. If you set a column to the value it currently has, MySQL notices this and doesn't update it. UPDATE returns the number of rows that were actually changed. In MySQL Version 3.22 or later, the C API function mysql_info() returns the number of rows that were matched and updated and the number of warnings that occurred during the UPDATE. If you update a column that has been declared NOT NULL by setting to NULL, the column is set to the default value appropriate for the column type and the warning count is incremented. The default value is is 0 for numeric types, the empty string ('') for string types, and the ``zero'' value for date and time types. PostGreSQL does things differently. They return a result from each SQL command, in particular INSERT, DELETE, and UPDATE commands. Again from the PostGreSQL documentation; INSERT Command: On successful completion, an INSERT command returns a command tag of the form INSERT oid count The count is the number of rows inserted. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. Otherwise oid is zero. DELETE command: On successful completion, a DELETE command returns a command tag of the form DELETE count The count is the number of rows deleted. If count is 0, no rows matched the condition (this is not considered an error). UPDATE command: On successful completion, an UPDATE command returns a command tag of the form UPDATE count The count is the number of rows updated. If count is 0, no rows matched the condition (this is not considered an error). These command tags can be retrieved using an API function PQCmdStatus. There does not appear to be any SQL function that can retrieve these values. These functions are used to extract information from PGresult objects that are not SELECT results. PQcmdStatus Returns the command status tag from the SQL command that generated the PGresult. char * PQcmdStatus(PGresult *res); Commonly this is just the name of the command, but it may include additional data such as the number of rows processed. PosGreSQL does not use auto-increment fields either. They use sequence values, like Oracle. They are described below: This section describes PostgreSQL's functions for operating on sequence objects. Sequence objects (also called sequence generators or just sequences) are special single-row tables created with CREATE SEQUENCE. A sequence object is usually used to generate unique identifiers for rows of a table. The sequence functions, listed in Table 9-34, provide simple, multiuser-safe methods for obtaining successive sequence values from sequence objects The available sequence functions are: nextval Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value. currval Return the value most recently obtained by nextval for this sequence in the current session. (An error is reported if nextval has never been called for this sequence in this session.) Notice that because this is returning a session-local value, it gives a predictable answer even if other sessions are executing nextval meanwhile. setval Reset the sequence object's counter value. The two-parameter form sets the sequence's last_value field to the specified value and sets its is_called field to true, meaning that the next nextval will advance the sequence before returning a value. In the three-parameter form, is_called may be set either true or false. If it's set to false, the next nextval will return exactly the specified value, and sequence advancement commences with the following nextval. For example, SELECT setval('foo', 42); Next nextval will return 43 SELECT setval('foo', 42, true); Same as above SELECT setval('foo', 42, false); Next nextval will return 42 The result returned by setval is just the value of its second argument. Important: To avoid blocking of concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions may leave unused "holes" in the sequence of assigned values. setval operations are never rolled back, either. If a sequence object has been created with default parameters, nextval calls on it will return successive values beginning with 1. Other behaviors can be obtained by using special parameters in the CREATE SEQUENCE command; see its command reference page for more information. Oracle does not use auto-increment fields either. They have things called sequences which are used to generate unique values for table keys. You can retrieve the current value or the next value from a sequence. There are also many restrictions on where these items can appear in the SQL code. The basic idea can be gleaned from the following examples from their documentation. Inserting sequence values into a table: Example This example increments the employee sequence and uses its value for a new employee inserted into the sample table hr.employees: INSERT INTO employees VALUES (employees_seq.nextval, 'John', 'Doe', 'jdoe', '555-1212', TO_DATE(SYSDATE), 'PU_CLERK', 2500, null, null, 30); Reusing the current value of a sequence: Example This example adds a new order with the next order number to the master order table. It then adds suborders with this number to the detail order table: INSERT INTO orders (order_id, order_date, customer_id) VALUES (orders_seq.nextval, TO_DATE(SYSDATE), 106); INSERT INTO order_items (order_id, line_item_id, product_id) VALUES (orders_seq.currval, 1, 2359); INSERT INTO order_items (order_id, line_item_id, product_id) VALUES (orders_seq.currval, 2, 3290); INSERT INTO order_items (order_id, line_item_id, product_id) VALUES (orders_seq.currval, 3, 2381); As best I can tell from a relatively quick look at the Oracle documentation, they do not provide a mechanism to determine the number of rows changed by an UPDATE, DELETE, or INSERT statement. If anyone knows that they do, please tell me how it is done. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

