On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote: > Any thoughts on the overall approach ? Any suggestions to > simplify things or any alternate designs ?
Well your design is very different from what we discussed, so I think I should post my proposed design alongside this, for further discussion. - - - Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY. CREATE INDEX CONCURRENTLY has no feature changes to work with HOT. CREATE INDEX works normally, except when HOT tuples are found, in which case special processing occurs requiring additional locking. The index build occurs in a single scan, as now. CREATE INDEX CONCURRENTLY ------------------------- We perform no chilling during the initial scan. We index the tuple identified by SnapshotNow, but we take the root tuple's htid, not the htid of the tuple version being indexed. We assume that the tuple version indexed will be the root of the HOT chain by the time the index is complete. Currently, we wait until all pre-existing transactions exit before we allow this to COMMIT. With HOT, we simply move the wait so it occurs *before* the second scan, then we can prune the HOT chains as we pass through the heap on the second scan. There will be no pre-existing HOT tuples and so no chilling is required. CREATE INDEX ------------ We add a field, xchill, to pg_class that stores TransactionIds. This is set to InvalidTransactionId if no index has been built yet. The concept of "main indexer" is introduced, so we allow multiple concurrent index builds, but only one of these can chill tuples at a time. 1. In IndexBuildHeapScan, as we pass through the table: a) if we find any any HOT rows, we check xchill and do one of steps (i-iii). Until this point, it hasn't been important whether we are the main or a secondary indexer. i) if xchill is InvalidTransactionId or is committed then we attempt to become main indexer immediately, following these steps before we continue building the index (1b) -- If the table is temp, or if we created the table then we immediately become the main indexer, so return immediately. If the table being indexed is already visible to everybody, then: -- Update pg_class entry for the table, setting the xchill field for the table to the builder's Xid. (Use heap_inplace_update for this, which is OK, whether we commit or abort). -- acquire AccessExclusiveLock on all existing indexes (not the heap) ii) If xchill is in-progress we wait for that transaction to complete, then do either step i) or iii). We cannot continue building our index until the other transaction commits because we cannot yet see the other index, yet we have to insert into it in order to correctly chill a tuple to allow *our* index to be built. iii) if xchill is aborted we abort also, saying that a VACUUM is needed. b) If we get here then we are the main indexer and can chill tuples. As we move through the scan we chill all HOT tuples, mark them HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for them in all existing indexes, as well as this one. Then remove CHILL_IN_PROGRESS flags, without writing WAL. c) release locks on indexes, before end of transaction 2. If we crash or a transaction abort occurs: - we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with HEAP_CHILL_IN_PROGRESS. - VACUUM must be used to clean up after an aborted index build and needs some additional code to allow this to occur. 3. Concurrent index builds are allowed. If we are not the main indexer, then we can attempt to build an index, but any scan that sees a HOT tuple will block and wait for the main index builder to complete before it proceeds. 4. When an indexscan reads the table, if it finds a HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent index scans and tuple chilling can mean that an index scan find the same tuple twice, by different routes, if a CREATE INDEX crashed. To avoid this an IndexScan will only find a tuple visible if it came across a HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed the path from root->HOT tuple. In this design, CREATE INDEX does have a deadlock risk when it is used within a transaction *and* the index is being built on a publicly visible table (i.e. not just-built and not temp). IMHO that risk is acceptable, since if users are worried about concurrent access to a table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings