Re: [HACKERS] Question: pg_class attributes and race conditions ?
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. I had an interesting thought this morning, these bits might less us do retail vacuum in a lot of cases. When you traverse an index and find that an index pointer points to a DEAD tuples you set the LP_DELETE flag. If at that time you set the tuple's bit indicating that the index pointer for that index then if we find all the bits are set (and whatever condition we have for ensuring that all indexes are represented by bits) we know there are no index pointers left and the tuple can now be retail vacuumed. I think in order for this to work we may want a rule that we don't have to dirty a page to set a index pointer missing bit though we would certainly need to dirty it (and wal log) it if we *clear* a bit. In other words a set bit would be a guarantee that the index pointer was missing but a clear bit would only be a hint that it might be present. The main problem with this is that it would necessitate WAL logging setting the LP_DELETE flag on index pointers which could be a large overhead for a SELECT. This interacts with two other proposed changes, HOT and truncating line pointers, which I think are both valuable. But I think it works with both. What I would suggest is the following: When we follow an index pointer, find a (non-truncated) DEAD tuple we truncate the line pointer, and initialize the length bits to an empty bitmask of index pointer missing flags. In the case of a HOT-updated tuple HOT would have to provide enough information for us to initialize the bitmask -- it's the same information that it needs anyways. Then we set the LP_DELETE flag on our own index pointer (and wal log it) and set the corresponding bit on the page. If we find an index pointer pointing to a truncated line pointer we set the flag in the bitmask. If we find that all the bits are set indicating that all indexes have successfully set their LP_DELETE flag then we the line pointer can be marked as !LP_USED. Of course only certain types of indexes would be able to do this, indexes that only ever have exactly one pointer to every tuple, and which have space for an LP_DELETED or equivalent flag. I believe currently this includes all except GIN. The part I'm most worried about with both this and the equivalent bits for HOT are maintaining the mapping from index to bit. I think it could be worked out, but it has to be done carefully. Dropping an index can't ever change the mapping and creating a new index can't ever leave a tuple with a bit incorrectly set for that index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee wrote: What is the safest way to access/modify the pg_class attribute and still avoid any race conditions with the other backends ? A specific example is: To solve the CREATE INDEX problem with HOT, I am thinking of adding (along with other things) a pg_class boolean attribute, say hot_update_enable. All backends are supposed to check this attribute before they perform an UPDATE. The attribute would usually be available in relation-rd_rel My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. This would ensure that no backend has a stale Relation pointer with stale value of hot_update_enable. FWIW this is pretty much the same I wanted to do with setting relfrozenxid to FrozenTransactionId. To this end I wrote a patch to add a catalog pg_ntclass (later renamed to pg_class_nt), which was ultimately rejected for reasons I don't remember at the time. Maybe it would be illuminating to investigate that -- please see the archives. (I still think it would be good to have a pg_class_nt catalog, so it's not a dead idea). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee [EMAIL PROTECTED] writes: My understanding is that the backend which sets this attribute must first acquire a lock on the heap relation of sufficient strength so as to ensure that there are no concurrent UPDATErs, update the pg_class row and then release the lock on the relation. In what context are you proposing to do that, and won't this high-strength lock in itself lead to deadlocks? The whole thing sounds exceedingly ugly anyway --- for example what happens if the backend doing the CREATE INDEX fails and is therefore unable to clear the flag again? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: In what context are you proposing to do that, and won't this high-strength lock in itself lead to deadlocks? The whole thing sounds exceedingly ugly anyway --- for example what happens if the backend doing the CREATE INDEX fails and is therefore unable to clear the flag again? Let me state the problem and a vague solution I am thinking of. I would appreciate comments and suggestions. The major known issue left with HOT is support for CREATE INDEX and CREATE INDEX CONCURRENTLY. The problem is with HEAP_ONLY tuples in the heap which do not have index entries in the existing indexes. When we build a new index, some or all of the HEAP_ONLY tuples may need index entries in the new index. It would be very ugly if we try to keep the existing indexes without index entries for those tuples. A clean solution would be to add index entries for the HEAP_ONLY tuples in the existing indexes and break all the HOT-chains. I would leave the details, but rather explain what I have in mind at high level. Any help to fill in the details or any suggestions to do things differently would immensely help. This is what I have in mind: In the context of CREATE INDEX [CONCURRENTLY], We first disable HOT-updates on the table. This would ensure that no new HOT tuples are added while we CHILL the heap. (How do we do this ?) We then start scanning the heap and start building the new index. If a HEAP_ONLY tuple is found which needs to be indexed, we mark the tuple with a CHILL_IN_PROGRESS flag and insert index entries into all the existing indexes. (The buffer is exclusively locked and the operation is WAL logged). We do this until entire heap is scanned. At this point, we would have inserted missing index entries for the HEAP_ONLY tuples. Till this point, we don't use the direct index entries to fetch the HEAP_ONLY tuples to avoid duplicate fetches of the same tuple. We now wait for all the concurrent index scans to end and then disable HOT-chain following logic to fetch tuples. (How do we do this ?) At this point, all index scans would ONLY use the direct path from the index to fetch tuples. The HOT-chains are not followed to avoid duplicate fetches of the same tuple. A second pass over the heap is now required to clear the CHILL_IN_PROGRESS, HEAP_ONLY and HEAP_HOT_UPDATED flags. At the end of this step, all the indexes and the table are in sync. Once again we need to ensure that there are no concurrent index scans in progress and then enable HOT-fetch. Also, HOT-updates can be turned on. If CREATE INDEX crashes, VACUUM is required to clear the CHILL_IN_PROGRESS flags and the corresponding index entries are removed. Since VACUUM runs mutually exclusive to CREATE INDEX, we don't need any special mechanism to handle race conditions between them. There are some other details like running multiple CREATE INDEX in parallel and still be able to CHILL the table safely. May be one of them needs to act as the chiller and others wait for it finish successfully. Any thoughts on the overall approach ? Any suggestions to simplify things or any alternate designs ? Can something as simple as CHILLing the table holding VACUUM FULL strength lock be acceptable ? Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. Let's assume that we'd use one byte in the tuple header to store HOT-flags. That's enough to store the HOT-status for 8 indexes. A set bit means that the index corresponding that bit doesn't have an index pointer in it. When you CREATE INDEX, assign a bit for the new index that's not currently in use. When you scan the table to build the index, clear that bit for every tuple if set and insert index entry as usual. DROP INDEX wouldn't need to scan the heap to clear the flags, because we clear them on CREATE INDEX when necessary. If you run out of bits in the header, IOW have more than 8 indexes on a table, indexes unlucky enough to not have a bit assigned to them wouldn't be HOT-updateable. This would also enable us to skip index inserts for those indexes whose key columns are not updated, and do the index inserts as usual for the rest. The limitation that you can only retail vacuum HOT-updated tuples when none of the indexed keys were changed remains, but we've accepted that already. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. Yes, it looks a bit complex. But IMHO we can get around that. Do you have any ideas in mind about doing that ? I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. I remember you raised this concern very early, but I am hopeful that we would be able to solve this. Would it be acceptable to have a simple (though not the best) solution for this release and then improve later on ? As I mentioned earlier, one option is to CHILL the table, if required, holding AccessExclusive lock, just like VACUUM FULL. I am assuming here that CREATE INDEX is not such a common activity, isn't that true ? I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? I think that will take away the ability to reuse HEAP_ONLY tuples without vacuuming the heap and index. Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote: Any thoughts on the overall approach ? Any suggestions to simplify things or any alternate designs ? Well your design is very different from what we discussed, so I think I should post my proposed design alongside this, for further discussion. - - - Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY. CREATE INDEX CONCURRENTLY has no feature changes to work with HOT. CREATE INDEX works normally, except when HOT tuples are found, in which case special processing occurs requiring additional locking. The index build occurs in a single scan, as now. CREATE INDEX CONCURRENTLY - We perform no chilling during the initial scan. We index the tuple identified by SnapshotNow, but we take the root tuple's htid, not the htid of the tuple version being indexed. We assume that the tuple version indexed will be the root of the HOT chain by the time the index is complete. Currently, we wait until all pre-existing transactions exit before we allow this to COMMIT. With HOT, we simply move the wait so it occurs *before* the second scan, then we can prune the HOT chains as we pass through the heap on the second scan. There will be no pre-existing HOT tuples and so no chilling is required. CREATE INDEX We add a field, xchill, to pg_class that stores TransactionIds. This is set to InvalidTransactionId if no index has been built yet. The concept of main indexer is introduced, so we allow multiple concurrent index builds, but only one of these can chill tuples at a time. 1. In IndexBuildHeapScan, as we pass through the table: a) if we find any any HOT rows, we check xchill and do one of steps (i-iii). Until this point, it hasn't been important whether we are the main or a secondary indexer. i) if xchill is InvalidTransactionId or is committed then we attempt to become main indexer immediately, following these steps before we continue building the index (1b) -- If the table is temp, or if we created the table then we immediately become the main indexer, so return immediately. If the table being indexed is already visible to everybody, then: -- Update pg_class entry for the table, setting the xchill field for the table to the builder's Xid. (Use heap_inplace_update for this, which is OK, whether we commit or abort). -- acquire AccessExclusiveLock on all existing indexes (not the heap) ii) If xchill is in-progress we wait for that transaction to complete, then do either step i) or iii). We cannot continue building our index until the other transaction commits because we cannot yet see the other index, yet we have to insert into it in order to correctly chill a tuple to allow *our* index to be built. iii) if xchill is aborted we abort also, saying that a VACUUM is needed. b) If we get here then we are the main indexer and can chill tuples. As we move through the scan we chill all HOT tuples, mark them HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for them in all existing indexes, as well as this one. Then remove CHILL_IN_PROGRESS flags, without writing WAL. c) release locks on indexes, before end of transaction 2. If we crash or a transaction abort occurs: - we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with HEAP_CHILL_IN_PROGRESS. - VACUUM must be used to clean up after an aborted index build and needs some additional code to allow this to occur. 3. Concurrent index builds are allowed. If we are not the main indexer, then we can attempt to build an index, but any scan that sees a HOT tuple will block and wait for the main index builder to complete before it proceeds. 4. When an indexscan reads the table, if it finds a HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent index scans and tuple chilling can mean that an index scan find the same tuple twice, by different routes, if a CREATE INDEX crashed. To avoid this an IndexScan will only find a tuple visible if it came across a HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed the path from root-HOT tuple. In this design, CREATE INDEX does have a deadlock risk when it is used within a transaction *and* the index is being built on a publicly visible table (i.e. not just-built and not temp). IMHO that risk is acceptable, since if users are worried about concurrent access to a table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Joshua D. Drake [EMAIL PROTECTED] writes: Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? affect. But I think you're right that generally you'll have two indexes. More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Well if you're indexing a column that you're updating then you've already failed your saving throw. The case we're trying to deal with is when you're updating columns that *aren't* indexed and therefore really don't need redundant index pointers for each tuple version with identical to the old versions. Especially since those index pointers are what's preventing us from vacuuming the old tuple versions. If you are updating an index key then there's no question you're going to need vacuum to clean out your index. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Heikki Linnakangas wrote: Tom Lane wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? The main objective of HOT is to enable retail vacuum of HOT-updated tuples. Doing the above would make it useless for that purpose, at least when there's more than one index on the table. Granted, there's a lot of tables with just one index out there, but it's a big limitation nevertheless. Agree. An extension of that idea, though is to store a flag per index in the HOT-updated tuple. We would then need a mapping between bits in the tuple header to indexes, for example as a new column in pg_index. I like the idea. The major objection would be that it adds a byte to the tuple header which when considered along with the null bitmap, may actually make the header 8 bytes larger in the worst case. Also, I am also worried about the additional complexity introduced with this. We can and should work on this idea, I am wondering whether it would be too much to do before the feature freeze. I am personally inclined towards doing something simpler to tackle the CREATE INDEX issue at the moment. But if that is not acceptable and/or you or anyone else is willing help me on this, we can work on a better solution. Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 12:40 -0400, Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Any thoughts on the overall approach ? Fragile and full of race conditions :-(. I thought from the beginning that CREATE INDEX might be a showstopper for the whole HOT concept, and it's starting to look like that's the case. Seems like we can fix all but some strange CREATE INDEX use cases. Since we have CREATE INDEX CONCURRENTLY, seems like HOT is a showstopper for the whole CREATE INDEX concept. I think what we need to get away from is the assumption that HOT-ness for one index is the same as HOT-ness for all. Sounds interesting. I'd not considered that before. What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? On its own, I don't think this is a sufficiently wide use-case. Perhaps we should do this PLUS make HOT-semantics optional for each additional index. i.e. HOT is always enforced on primary indexes and optionally on other indexes (but not by default). If you accept the HOT option on an index, you then accept the additional issues surrounding chilling tuples. Bear in mind that there aren't any at all if you use CREATE INDEX CONCURRENTLY and many other cases. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Just to throw my two bits in here :). If we do that, how does that effect the idea that most people in the web world use (id serial primary key), even though that is not what they are searching on? affect. But I think you're right that generally you'll have two indexes. More specifically, does HOT help conditions where a composite comes into play (session_id,last_active) ... which would be a more heavily updated index than just the primary key. Well if you're indexing a column that you're updating then you've already failed your saving throw. Just for everyone who missed this. Greg Stark obviously spends his time (or at some time) playing|ed DD. I have an Epic level Sorcerer, how about you Greg? ;) Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Question: pg_class attributes and race conditions ?
Simon Riggs wrote: What if we only applied HOT to primary-key indexes, so that there was certainly not more than one index per table that the property applies to? On its own, I don't think this is a sufficiently wide use-case. Perhaps we should do this PLUS make HOT-semantics optional for each additional index. i.e. HOT is always enforced on primary indexes and optionally on other indexes (but not by default). Here's is a very simple, low-tech idea. How about checking whether the new index requires chilling tuples; if it does, then elog(ERROR) until all the indexes have been manually chilled, which would be done with an ALTER INDEX ... CHILL command or something like that. Only when all indexes are known chilled, you can create another one, and then the user can hotify indexes as appropriate. (Disclaimer: I haven't followed the HOT design closely to know if this makes enough sense) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Question: pg_class attributes and race conditions ?
On Fri, 2007-03-16 at 16:59 -0400, Alvaro Herrera wrote: Here's is a very simple, low-tech idea. How about checking whether the new index requires chilling tuples; if it does, then elog(ERROR) until all the indexes have been manually chilled, which would be done with an ALTER INDEX ... CHILL command or something like that. Only when all indexes are known chilled, you can create another one, and then the user can hotify indexes as appropriate. Well, I've spent two weeks searching for a design that does CREATE INDEX without changing existing functionality. What's been proposed is very close, but not exact. CREATE INDEX CONCURRENTLY can work, so we're just discussing the other increasingly edgy cases. I agree some kind of compromise on CREATE INDEX seems to be required if we want HOT without some drastic reductions in function. I'm happy to go for low tech approaches, or anything really. Simple is good, so we can hit feature freeze. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match