Re: [HACKERS] Word-smithing doc changes
On Fri, Aug 3, 2012 at 01:23:53PM -0400, Bruce Momjian wrote: On Fri, Aug 3, 2012 at 12:55:30PM -0400, Álvaro Herrera wrote: Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012: On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote: The concurrent index documentation under discussion above was never updated, so I took a stab at it, attached. Greg, I looked at adding a mention of the virtual transaction wait to the explicit-locking section as you suggested, and found those were all user-visible locking, while this is internal locking. I did find a clear description of transaction id locking in the pg_locks system view docs, so I just referenced that. I found a way to clarify the wording further; patch attached. Looks sane to me. Are we backpatching this to 9.1? I no longer remember if the original wording is there or just in 9.2. I wasn't planning to, but will do as you suggest for 9.1. Done. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote: The concurrent index documentation under discussion above was never updated, so I took a stab at it, attached. Greg, I looked at adding a mention of the virtual transaction wait to the explicit-locking section as you suggested, and found those were all user-visible locking, while this is internal locking. I did find a clear description of transaction id locking in the pg_locks system view docs, so I just referenced that. I found a way to clarify the wording further; patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml new file mode 100644 index 2ab6470..17b433a *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] *** 394,408 /para para ! In a concurrent index build, the index is actually entered into the ! system catalogs in one transaction, then the two table scans occur in a ! second and third transaction. All active transactions at the time the ! second table scan starts, not just ones that already involve the table, ! have the potential to block the concurrent index creation until they ! finish. When checking for transactions that could still use the original ! index, concurrent index creation advances through potentially interfering ! older transactions one at a time, obtaining shared locks on their virtual ! transaction identifiers to wait for them to complete. /para para --- 394,407 /para para ! In a concurrent index build, the index is actually entered into ! the system catalogs in one transaction, then two table scans occur in ! two more transactions. Any transaction active when the second table ! scan starts can block concurrent index creation until it completes, ! even transactions that only reference the table after the second table ! scan starts. Concurrent index creation serially waits for each old ! transaction to complete using the method outlined in section xref ! linkend=view-pg-locks. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012: On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote: The concurrent index documentation under discussion above was never updated, so I took a stab at it, attached. Greg, I looked at adding a mention of the virtual transaction wait to the explicit-locking section as you suggested, and found those were all user-visible locking, while this is internal locking. I did find a clear description of transaction id locking in the pg_locks system view docs, so I just referenced that. I found a way to clarify the wording further; patch attached. Looks sane to me. Are we backpatching this to 9.1? I no longer remember if the original wording is there or just in 9.2. -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Fri, Aug 3, 2012 at 12:55:30PM -0400, Álvaro Herrera wrote: Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012: On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote: The concurrent index documentation under discussion above was never updated, so I took a stab at it, attached. Greg, I looked at adding a mention of the virtual transaction wait to the explicit-locking section as you suggested, and found those were all user-visible locking, while this is internal locking. I did find a clear description of transaction id locking in the pg_locks system view docs, so I just referenced that. I found a way to clarify the wording further; patch attached. Looks sane to me. Are we backpatching this to 9.1? I no longer remember if the original wording is there or just in 9.2. I wasn't planning to, but will do as you suggest for 9.1. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Wed, Nov 30, 2011 at 09:47:40PM -0800, Greg Smith wrote: On 11/30/2011 10:20 AM, Greg Stark wrote: Given your confusion it's clear that we have to explain that it will wait one by one for each transaction that was started before the index was created to finish. When the index was created is a fuzzy thing though. It looked to me like it makes this check at the start of Phase 2. If I read when the index was created in the manual, I would assume that meant the instant at which CREATE INDEX CONCURRENTLY started. I don't think that's actually the case though; it's actually delayed to the beginning of Phase 2 start, which can easily be hours later for big indexes. Please correct me if I'm reading that wrong. I don't think we need to explain how that's implemented. If we do it should be set aside in some way, something like (see virtual transaction id locks inhref...). Fair enough. There is this wording in the pg_locks documentation: When one transaction finds it necessary to wait specifically for another transaction, it does so by attempting to acquire share lock on the other transaction ID (either virtual or permanent ID depending on the situation). That will succeed only when the other transaction terminates and releases its locks. Linking to that instead of trying to duplicate it is a good idea. Another good, related idea might be to expand the main Concurrency Control chapter to include this information. When I re-read Explicit Locking again: http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html it strikes me it would be nice to add Transaction Locks as an full entry there. The trivia around how those work is really kind of buried in the pg_locks section. I just want to keep in mind that the reader here is trying to understand how to use create index concurrently, not understand how Postgres's locking infrastructure works in general. To the extent they can be ignorant of the locking infrastructure, that's true. This operation is complicated enough that I don't think we can hide too many of the details from the users, while still letting them assess the risk and potential duration accurately. The concurrent index documentation under discussion above was never updated, so I took a stab at it, attached. Greg, I looked at adding a mention of the virtual transaction wait to the explicit-locking section as you suggested, and found those were all user-visible locking, while this is internal locking. I did find a clear description of transaction id locking in the pg_locks system view docs, so I just referenced that. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml new file mode 100644 index 2ab6470..fe2debf *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] *** 395,408 para In a concurrent index build, the index is actually entered into the ! system catalogs in one transaction, then the two table scans occur in a ! second and third transaction. All active transactions at the time the ! second table scan starts, not just ones that already involve the table, ! have the potential to block the concurrent index creation until they ! finish. When checking for transactions that could still use the original ! index, concurrent index creation advances through potentially interfering ! older transactions one at a time, obtaining shared locks on their virtual ! transaction identifiers to wait for them to complete. /para para --- 395,407 para In a concurrent index build, the index is actually entered into the ! system catalogs in one transaction, then the two table scans occur in ! second and third transactions. Any transaction active when the ! second table scan starts, not just those that have already referenced ! the table, can block concurrent index creation until it completes. ! When waiting for all old transactions, concurrent index creation ! serially waits for each old transaction to complete, as outlined in ! section xref linkend=view-pg-locks. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011: I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. Looks like I missed this when it passed by before, and looks like Greg Stark may have missed the message on pgsql-docs that kicked this all off: http://archives.postgresql.org/message-id/4ddb64cb.7070...@2ndquadrant.com I will happily accept that the description there may have suffered from me not using all of the terms optimally, and that the resulting commit could be improved. Some more feedback to get the description correct and useful would be much appreciated. What I cannot agree with is that idea that the implementation details I suggested documenting should not be. There are extremely user-hostile things that can happen here, and that are unique to this command. Saying this is too complicated for users to make heads or tails of may very well be true in many cases, but I think it's not giving PostgreSQL users very much credit. And when problems with this happen, and I wouldn't have spent any time on this if they didn't, right now the only way to make heads or tails of it is to read the source code. If the code was simple, quick, and had no failure modes, it would be fine to not describe it. This is complicated, the run time cannot be bounded, and it can ripple to nasty lock queue issues--at some impossible to predict future time, long after you start the creation. I don't have a good idea how to unload the potential foot gun. The best I could think of after being shot with it was describing how it fires. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. That particular suggestion came from me having a painful session I didn't want anyone else to ever go through again. By the end of that, this implementation detail felt like the most important missing piece of PostgreSQL documentation in the world to me--I'm too busy to send in doc patches describing things that I haven't been shot by. To provide some more context, the server I ran into this on always has at least 2 reports that take 10 to 16 hours to run active. I happily kicked off a concurrent index build on a heavily bloated 1GB table whose indexes are typically 5GB, which I expect to take a few minutes given the small size involved. Six hours later, when I come back and discover it's still not done, I find a single lock waiting for a transaction to finish.Since I'm used to multiple locks forming into a tree structure, and I only see one, I expect I'm OK once that's done. Fine; I estimate how much time that report has left and leave for a bit. Four hours later, I come back. That original transaction lock is gone. Now it's created a new one I didn't expect, moving onto the second oldest report active. I actually have six more hours to go still. This locking pattern is unique to this command, and if I'd had the slightest idea that it worked this way I'd have approached the rebuild differently. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Wed, Nov 30, 2011 at 3:02 AM, Greg Smith g...@2ndquadrant.com wrote: I will happily accept that the description there may have suffered from me not using all of the terms optimally, and that the resulting commit could be improved. Some more feedback to get the description correct and useful would be much appreciated. What I cannot agree with is that idea that the implementation details I suggested documenting should not be. There are extremely user-hostile things that can happen here, and that are unique to this command. Saying this is too complicated for users to make heads or tails of may very well be true in many cases, but I think it's not giving PostgreSQL users very much credit. And when problems with this happen, and I wouldn't have spent any time on this if they didn't, right now the only way to make heads or tails of it is to read the source code. +1. If we only document approximately how it works, then that's less work, but it's also less useful. Greg's attempt to document *exactly* how it works was kind of klunky, but I think that can and should be improved, not replaced with wording that's more vague and therefore easier to write. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Wed, Nov 30, 2011 at 8:02 AM, Greg Smith g...@2ndquadrant.com wrote: Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. What I cannot agree with is that idea that the implementation details I suggested documenting should not be. What I'm suggesting is translating things like shared locks on virtual transaction identifiers into what that means for users. Namely saying something like waiting for a transaction to complete. Given your confusion it's clear that we have to explain that it will wait one by one for each transaction that was started before the index was created to finish. I don't think we need to explain how that's implemented. If we do it should be set aside in some way, something like (see virtual transaction id locks in href...). I just want to keep in mind that the reader here is trying to understand how to use create index concurrently, not understand how Postgres's locking infrastructure works in general. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On 11/30/2011 10:20 AM, Greg Stark wrote: Given your confusion it's clear that we have to explain that it will wait one by one for each transaction that was started before the index was created to finish. When the index was created is a fuzzy thing though. It looked to me like it makes this check at the start of Phase 2. If I read when the index was created in the manual, I would assume that meant the instant at which CREATE INDEX CONCURRENTLY started. I don't think that's actually the case though; it's actually delayed to the beginning of Phase 2 start, which can easily be hours later for big indexes. Please correct me if I'm reading that wrong. I don't think we need to explain how that's implemented. If we do it should be set aside in some way, something like (see virtual transaction id locks inhref...). Fair enough. There is this wording in the pg_locks documentation: When one transaction finds it necessary to wait specifically for another transaction, it does so by attempting to acquire share lock on the other transaction ID (either virtual or permanent ID depending on the situation). That will succeed only when the other transaction terminates and releases its locks. Linking to that instead of trying to duplicate it is a good idea. Another good, related idea might be to expand the main Concurrency Control chapter to include this information. When I re-read Explicit Locking again: http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html it strikes me it would be nice to add Transaction Locks as an full entry there. The trivia around how those work is really kind of buried in the pg_locks section. I just want to keep in mind that the reader here is trying to understand how to use create index concurrently, not understand how Postgres's locking infrastructure works in general. To the extent they can be ignorant of the locking infrastructure, that's true. This operation is complicated enough that I don't think we can hide too many of the details from the users, while still letting them assess the risk and potential duration accurately. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
Alvaro Herrera wrote: Excerpts from Greg Stark's message of s??b jun 25 21:01:36 -0400 2011: I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. Hmm, right. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like active transactions, potentially interfering older transactions, and original index -- from the user's point of view there's only one index and it just isn't completely built yet. Wow, that's a lot of mistakes for a single paragraph, sorry about that. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. I don't -- please go ahead. Original complaint in Message-id 4ddb64cb.7070...@2ndquadrant.com I have simplified the concurrent index build docs with the attached patch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml new file mode 100644 index e8a7caf..7391a5f *** a/doc/src/sgml/ref/create_index.sgml --- b/doc/src/sgml/ref/create_index.sgml *** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] *** 400,413 para In a concurrent index build, the index is actually entered into the ! system catalogs in one transaction, then the two table scans occur in a ! second and third transaction. All active transactions at the time the ! second table scan starts, not just ones that already involve the table, ! have the potential to block the concurrent index creation until they ! finish. When checking for transactions that could still use the original ! index, concurrent index creation advances through potentially interfering ! older transactions one at a time, obtaining shared locks on their virtual ! transaction identifiers to wait for them to complete. /para para --- 400,409 para In a concurrent index build, the index is actually entered into the ! system catalogs in one transaction; two table scans then occur in a ! second and third transaction. The concurrent index build will not ! complete until all running transactions can see the new index ! definition. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011: I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. Hmm, right. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like active transactions, potentially interfering older transactions, and original index -- from the user's point of view there's only one index and it just isn't completely built yet. Wow, that's a lot of mistakes for a single paragraph, sorry about that. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. I don't -- please go ahead. Original complaint in Message-id 4ddb64cb.7070...@2ndquadrant.com -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Word-smithing doc changes
On Sat, Jun 25, 2011 at 9:01 PM, Greg Stark st...@mit.edu wrote: I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then the two table scans occur in a - second and third transaction. + second and third transaction. All active transactions at the time the + second table scan starts, not just ones that already involve the table, + have the potential to block the concurrent index creation until they + finish. When checking for transactions that could still use the original + index, concurrent index creation advances through potentially interfering + older transactions one at a time, obtaining shared locks on their virtual + transaction identifiers to wait for them to complete. Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like active transactions, potentially interfering older transactions, and original index -- from the user's point of view there's only one index and it just isn't completely built yet. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. We don't have a string freeze, and certainly not for the documentation, so if you'd like to wordsmith some more, have at it. But it would probably be best to post your revised version and solicit feedback before committing, since there was quite a bit of discussion about that change before it was made. (Sorry, don't have the pointer at the moment...) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Word-smithing doc changes
I think this commit was ill-advised: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372 In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then the two table scans occur in a -second and third transaction. +second and third transaction. All active transactions at the time the +second table scan starts, not just ones that already involve the table, +have the potential to block the concurrent index creation until they +finish. When checking for transactions that could still use the original +index, concurrent index creation advances through potentially interfering +older transactions one at a time, obtaining shared locks on their virtual +transaction identifiers to wait for them to complete. Seems way to implementation-specific and detailed for a user to make heads or tails of. Except in the sections talking about locking internals we don't talk about shared locks on virtual transactions identifiers we just talk about waiting for a transaction to complete. And looping over the transactions one by one is purely an implementation detail and uninteresting to users. Also it uses ill-defined terms like active transactions, potentially interfering older transactions, and original index -- from the user's point of view there's only one index and it just isn't completely built yet. Are we not yet in string-freeze though? I'll go ahead and edit it if people don't mind. I'm curious to see the original complaint though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers