FAST PostgreSQL wrote:
> We are trying to develop the updateable cursors functionality into
> Postgresql. I have given below details of the design and also issues we are
> facing. Looking forward to the advice on how to proceed with these issues.
>
> Rgds,
> Arul Shaji
Would this be something that you would hope to submit for 8.3?
Joshua D. Drake
>
>
>
>
>
> 1. Introduction
> --------------
> This is a combined proposal and design document for adding updatable
> (insensitive) cursor capability to the PostgreSQL database.
> There have already been a couple of previous proposals since 2003 for
> implementing this feature so there appears to be community interest in doing
> so. This will enable the following constructs to be processed:
>
>
> UPDATE <table_name> SET value_list WHERE CURRENT OF <cursor_name>
> DELETE FROM <table_name> WHERE CURRENT OF <cursor_name>
>
> This has the effect of users being able to update or delete specific rows of
> a table, as defined by the row currently fetched into the cursor.
>
>
> 2. Overall Conceptual Design
> -----------------------------
> The design is considered from the viewpoint of progression of a command
> through the various stages of processing, from changes to the file ‘gram.y’
> to implement the actual grammar changes, through to changes in the Executor
> portion of the database architecture.
>
> 2.1 Changes to the Grammar
> ------------------------------
> The following changes will be done to the PostgreSQL grammar:
>
> UPDATE statement has the option ‘WHERE CURRENT OF <cursor_name>’ added
> DELETE statement has the option ‘WHERE CURRENT OF <cursor_name>’ added
>
> The cursor_name data is held in the UpdateStmt and DeleteStmt structures and
> contains just the name of the cursor.
>
> The pl/pgsql grammar changes in the same manner.
>
> The word CURRENT will be added to the ScanKeywords array in keywords.c.
>
>
> 2.2 Changes to Affected Data Structures
> ------------------------------------------
> The following data structures are affected by this change:
>
> Portal structure, QueryDesc structure, the UpdateStmt and DeleteStmt
> structures
>
> The Portal will contain a list of structures of relation ids and tuple ids
> relating to the tuple held in the QueryDesc structure. There will be one
> entry in the relation and tuple id list for each entry in the relation-list
> of the statement below:
>
> DECLARE <cursor_name> [WITH HOLD] SELECT FOR UPDATE OF <relation-list>
>
> The QueryDesc structure will contain the relation id and the tuple id
> relating to the tuple obtained via the FETCH command so that it can be
> propagated back to the Portal for storage in the list described above.
>
> The UpdateStmt and DeleteStmt structures have the cursor name added so that
> the information is available for use in obtaining the portal structure
> related to the cursor previously opened via the DECLARE CURSOR request.
>
>
> 2.3 Changes to the SQL Parser
> ------------------------------------
> At present, although the FOR UPDATE clause of the DECLARE CURSOR command has
> been present in the grammar, it causes an error message later in the
> processing since cursors are currently not updatable. This now needs to
> change. The ‘FOR UPDATE’ clause has to be valid, but not the ‘FOR SHARE’
> clause.
>
> The relation names that follow the ‘FOR UPDATE’ clause will be added to the
> rtable in the Query structure and identified by means of the rowMarks array.
> In the case of an updatable cursor the FOR SHARE option is not allowed
> therefore all entries in the rtable that are identified by the rowMarks array
> must relate to tables that are FOR UPDATE.
>
> In the UPDATE or DELETE statements the ‘WHERE CURRENT OF <cursor_name>’
> clause results in the cursor name being placed in the UpdateStmt or
> DeleteStmt structure. During the processing of the functions -
> transformDeleteStmt() and transformUpdateStmt() - the cursor name is used to
> obtain a pointer to the related Portal structure and the tuple affected by
> the current UPDATE or DELETE statement is extracted from the Portal, where it
> has been placed as the result of a previous FETCH request. At this point all
> the information for the UPDATE or DELETE statement is available so the
> statements can be transformed into standard UPDATE or DELETE statements and
> sent for re-write/planning/execution as usual.
>
> 2.4 Changes to the Optimizer
> ------------------------------
> There is a need to add a TidScan node to planning UPDATE / DELETE statements
> where the statements are ‘UPDATE / DELETE at position’. This is to enable the
> tuple ids of the tuples in the tables relating to the query to be obtained.
> There will need to be a new mechanism to achieve this, as at present, a Tid
> scan is done only if there is a standard WHERE condition on update or delete
> statements to provide Tid qualifier data.
>
>
> 2.5 Changes to the Executor
> -------------------------------
> There are various options that have been considered for this part of the
> enhancement. These are described in the sections below.
>
> We would like to hear opinions on which option is the best way to go or if
> none of these is acceptable, any alternate ideas ?
>
> Option 1 MVCC Via Continuous Searching of Database
>
> The Executor is to be changed in the following ways:
> 1) When the FETCH statement is executed the id of the resulting tuple is
> extracted and passed back to the Portal structure to be saved to indicate the
> cursor is currently positioned on a tuple.
> 2) When the UPDATE or DELETE request is executed the tuple id previously
> FETCHed is held in the QueryDesc structure so that it can be compared with
> the tuple ids returned from the TidScan node processed prior to the actual
> UPDATE / DELETE node in the plan. This enables a decision to be made as to
> whether the tuple held in the cursor is visible to the UPDATE / DELETE
> request according to the rules of concurrency. The result is that, at the
> cost of repeatedly searching the database at each UPDATE / DELETE command,
> the hash table is no longer required.
> This approach has the advantage that there is no hash table held in memory or
> on disk so it will not be memory intensive but will be processing intensive.
>
> This is a good ‘one-off’ solution to the problem and, taken in isolation is
> probably the best approach. However, if one considers the method(s) used in
> other areas of PostgreSQL, it is probably not the best solution. This option
> will probably not be used further.
>
> Option 2 MVCC via New Snapshot
>
> The executor can be changed by adding a new kind of snapshot that is
> specifically used for identifying if a given tuple, retrieved from the
> database during an update or delete statement should be visible during the
> current transaction.
>
> This approach requires a new kind of snapshot (this idea was used by Gavin
> for a previous updatable cursor patch but objections were raised.)
>
> Option 3 MVCC Via Hash Table in Memory
>
> The executor can be changed by saving into a hash table and comparing each
> tuple in the cursor with that set to check if the tuple should be visible.
> This approach has the advantage that it will be quick. It has the
> disadvantage that, since the hash table will contain all the tuples of the
> table being checked that it may use all local memory for a large table.
>
> Option 4 MVCC Via Hash Table on Disk
>
> When the UPDATE or DELETE request is executed the first time the Tid scan
> database retrieval will be done first. At this time the tuple id of each row
> in the table to be updated by the request will be available in the executor.
> These tuple ids need to be stored in a hash table that is stored to disk, as,
> if the table is large there could be a huge number of tuple ids. This data is
> then available for comparison with the individual tuple to be updated or
> deleted to check if it should be processed. The hash table will exist for the
> duration of the transaction, from BEGIN to END (or ABORT).
>
> The hash table is then used to identify if the tuple should be visible during
> the current transaction. If the tuple should be visible then the update or
> delete proceeds as usual.
>
> This approach has the advantage that it will use little memory but will be
> relatively slow as the data has to be accessed from disk.
>
> Option 5 Store Tuple Id in Snapshot.
>
> The Snapshot structure can be changed to include the tuple id. This enables
> the current state of the tuple to be identified with respect to the current
> transaction.
> The tuple id, as identified in the cursor at the point where the
> DELETE/UPDATE statement is being processed, can use the snapshot to identify
> if the tuple should be visible in the context of the current transaction.
>
>
> 2.6 Changes to the Catalog
> ----------------------------
> The Catalog needs to reflect changes introduced by the updatable cursor
> implementation. A boolean attribute ‘is_for_update’ is to be added to the
> pg_cursors implementation. It will define that the cursor is for update
> (value is FALSE) or for share (value is TRUE, the default value).
>
>
> 3 Design Assumptions
> ----------------------------
> The following design assumptions are made:
>
> As PostgreSQL8.2 does not support the SENSITIVE cursor option the tuples
> contained in a cursor can never be updated so these tuples will always appear
> in their ‘original’ form as at the start of the transaction. This is in
> breach of the SQL2003 Standard as described in 5WD-02-Foundation-2003-09.pdf,
> p 810. The standard requires the updatable cursor to be declared as sensitive.
>
> With respect to nested transactions – In PostgreSQL nested transactions are
> implemented by defining ‘save points’ via the keyword SAVEPOINT. A ‘ROLLBACK
> TO SAVEPOINT’ rolls back the database contents to the last savepoint in this
> transaction or the begin statement, whichever is closer.
>
> It is assumed that the FETCH statement is used to return only a single row
> into the cursor with each command when the cursor is updatable.
>
> According to the SQL2003 Standard Update and Delete statements may contain
> only a single base table.
>
> The DECLARE CURSOR statement is supposed to use column level locking, but
> PostgreSQL supports only row level locking. The result of this is that the
> column list that the standard requires ‘DECLARE <cursor_name> SELECT … FOR
> UPDATE OF column-list’ becomes a relation (table) list.
>
> This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27
> 003 693 481. It is confidential to the ordinary user of the email address to
> which it was addressed and may contain copyright and/or legally privileged
> information. No one else may read, print, store, copy or forward all or any
> of it or its attachments. If you receive this email in error, please return
> to sender. Thank you.
>
> If you do not wish to receive commercial email messages from Fujitsu
> Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED]
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org