We are trying to develop the updateable cursors functionality into Postgresql. I have given below details of the design and also issues we are facing. Looking forward to the advice on how to proceed with these issues.
Rgds, Arul Shaji 1. Introduction -------------- This is a combined proposal and design document for adding updatable (insensitive) cursor capability to the PostgreSQL database. There have already been a couple of previous proposals since 2003 for implementing this feature so there appears to be community interest in doing so. This will enable the following constructs to be processed: UPDATE <table_name> SET value_list WHERE CURRENT OF <cursor_name> DELETE FROM <table_name> WHERE CURRENT OF <cursor_name> This has the effect of users being able to update or delete specific rows of a table, as defined by the row currently fetched into the cursor. 2. Overall Conceptual Design ----------------------------- The design is considered from the viewpoint of progression of a command through the various stages of processing, from changes to the file gram.y to implement the actual grammar changes, through to changes in the Executor portion of the database architecture. 2.1 Changes to the Grammar ------------------------------ The following changes will be done to the PostgreSQL grammar: UPDATE statement has the option WHERE CURRENT OF <cursor_name> added DELETE statement has the option WHERE CURRENT OF <cursor_name> added The cursor_name data is held in the UpdateStmt and DeleteStmt structures and contains just the name of the cursor. The pl/pgsql grammar changes in the same manner. The word CURRENT will be added to the ScanKeywords array in keywords.c. 2.2 Changes to Affected Data Structures ------------------------------------------ The following data structures are affected by this change: Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt structures The Portal will contain a list of structures of relation ids and tuple ids relating to the tuple held in the QueryDesc structure. There will be one entry in the relation and tuple id list for each entry in the relation-list of the statement below: DECLARE <cursor_name> [WITH HOLD] SELECT FOR UPDATE OF <relation-list> The QueryDesc structure will contain the relation id and the tuple id relating to the tuple obtained via the FETCH command so that it can be propagated back to the Portal for storage in the list described above. The UpdateStmt and DeleteStmt structures have the cursor name added so that the information is available for use in obtaining the portal structure related to the cursor previously opened via the DECLARE CURSOR request. 2.3 Changes to the SQL Parser ------------------------------------ At present, although the FOR UPDATE clause of the DECLARE CURSOR command has been present in the grammar, it causes an error message later in the processing since cursors are currently not updatable. This now needs to change. The FOR UPDATE clause has to be valid, but not the FOR SHARE clause. The relation names that follow the FOR UPDATE clause will be added to the rtable in the Query structure and identified by means of the rowMarks array. In the case of an updatable cursor the FOR SHARE option is not allowed therefore all entries in the rtable that are identified by the rowMarks array must relate to tables that are FOR UPDATE. In the UPDATE or DELETE statements the WHERE CURRENT OF <cursor_name> clause results in the cursor name being placed in the UpdateStmt or DeleteStmt structure. During the processing of the functions - transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to obtain a pointer to the related Portal structure and the tuple affected by the current UPDATE or DELETE statement is extracted from the Portal, where it has been placed as the result of a previous FETCH request. At this point all the information for the UPDATE or DELETE statement is available so the statements can be transformed into standard UPDATE or DELETE statements and sent for re-write/planning/execution as usual. 2.4 Changes to the Optimizer ------------------------------ There is a need to add a TidScan node to planning UPDATE / DELETE statements where the statements are UPDATE / DELETE at position. This is to enable the tuple ids of the tuples in the tables relating to the query to be obtained. There will need to be a new mechanism to achieve this, as at present, a Tid scan is done only if there is a standard WHERE condition on update or delete statements to provide Tid qualifier data. 2.5 Changes to the Executor ------------------------------- There are various options that have been considered for this part of the enhancement. These are described in the sections below. We would like to hear opinions on which option is the best way to go or if none of these is acceptable, any alternate ideas ? Option 1 MVCC Via Continuous Searching of Database The Executor is to be changed in the following ways: 1) When the FETCH statement is executed the id of the resulting tuple is extracted and passed back to the Portal structure to be saved to indicate the cursor is currently positioned on a tuple. 2) When the UPDATE or DELETE request is executed the tuple id previously FETCHed is held in the QueryDesc structure so that it can be compared with the tuple ids returned from the TidScan node processed prior to the actual UPDATE / DELETE node in the plan. This enables a decision to be made as to whether the tuple held in the cursor is visible to the UPDATE / DELETE request according to the rules of concurrency. The result is that, at the cost of repeatedly searching the database at each UPDATE / DELETE command, the hash table is no longer required. This approach has the advantage that there is no hash table held in memory or on disk so it will not be memory intensive but will be processing intensive. This is a good one-off solution to the problem and, taken in isolation is probably the best approach. However, if one considers the method(s) used in other areas of PostgreSQL, it is probably not the best solution. This option will probably not be used further. Option 2 MVCC via New Snapshot The executor can be changed by adding a new kind of snapshot that is specifically used for identifying if a given tuple, retrieved from the database during an update or delete statement should be visible during the current transaction. This approach requires a new kind of snapshot (this idea was used by Gavin for a previous updatable cursor patch but objections were raised.) Option 3 MVCC Via Hash Table in Memory The executor can be changed by saving into a hash table and comparing each tuple in the cursor with that set to check if the tuple should be visible. This approach has the advantage that it will be quick. It has the disadvantage that, since the hash table will contain all the tuples of the table being checked that it may use all local memory for a large table. Option 4 MVCC Via Hash Table on Disk When the UPDATE or DELETE request is executed the first time the Tid scan database retrieval will be done first. At this time the tuple id of each row in the table to be updated by the request will be available in the executor. These tuple ids need to be stored in a hash table that is stored to disk, as, if the table is large there could be a huge number of tuple ids. This data is then available for comparison with the individual tuple to be updated or deleted to check if it should be processed. The hash table will exist for the duration of the transaction, from BEGIN to END (or ABORT). The hash table is then used to identify if the tuple should be visible during the current transaction. If the tuple should be visible then the update or delete proceeds as usual. This approach has the advantage that it will use little memory but will be relatively slow as the data has to be accessed from disk. Option 5 Store Tuple Id in Snapshot. The Snapshot structure can be changed to include the tuple id. This enables the current state of the tuple to be identified with respect to the current transaction. The tuple id, as identified in the cursor at the point where the DELETE/UPDATE statement is being processed, can use the snapshot to identify if the tuple should be visible in the context of the current transaction. 2.6 Changes to the Catalog ---------------------------- The Catalog needs to reflect changes introduced by the updatable cursor implementation. A boolean attribute is_for_update is to be added to the pg_cursors implementation. It will define that the cursor is for update (value is FALSE) or for share (value is TRUE, the default value). 3 Design Assumptions ---------------------------- The following design assumptions are made: As PostgreSQL8.2 does not support the SENSITIVE cursor option the tuples contained in a cursor can never be updated so these tuples will always appear in their original form as at the start of the transaction. This is in breach of the SQL2003 Standard as described in 5WD-02-Foundation-2003-09.pdf, p 810. The standard requires the updatable cursor to be declared as sensitive. With respect to nested transactions In PostgreSQL nested transactions are implemented by defining save points via the keyword SAVEPOINT. A ROLLBACK TO SAVEPOINT rolls back the database contents to the last savepoint in this transaction or the begin statement, whichever is closer. It is assumed that the FETCH statement is used to return only a single row into the cursor with each command when the cursor is updatable. According to the SQL2003 Standard Update and Delete statements may contain only a single base table. The DECLARE CURSOR statement is supposed to use column level locking, but PostgreSQL supports only row level locking. The result of this is that the column list that the standard requires DECLARE <cursor_name> SELECT FOR UPDATE OF column-list becomes a relation (table) list. This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match