On Sun, Mar 27, 2022 at 11:12 PM David G. Johnston <david.g.johns...@gmail.com> wrote: > > On Sun, Mar 27, 2022 at 11:17 AM James Coleman <jtc...@gmail.com> wrote: >> >> Hmm, I didn't realize that was project policy, > > > Guideline/Rule of Thumb is probably a better concept.
Ah, OK, thanks. >> >> but I'm a bit >> surprised given that the sentence which 0001 replaces seems like a >> direct violation of that also: "In neither case is a rewrite of the >> table required." >> > > IMO mostly due to the structuring of the paragraphs; something like the > following makes it less problematic (and as shown below may be sufficient to > address the purpose of this patch) > > """ > [...] > The following alterations of the table require the entire table, and/or its > indexes, to be rewritten; which may take a significant amount of time for a > large table, and will temporarily require as much as double the disk space. > > Changing the type of an existing column will require the entire table and its > indexes to be rewritten. As an exception, if the USING clause does not change > the column contents and the old type is either binary coercible to the new > type, or an unconstrained domain over the new type, a table rewrite is not > needed; but any indexes on the affected columns must still be rewritten. > > Adding a column with a volatile DEFAULT also requires the entire table and > its indexes to be rewritten. > > The reason a non-volatile (or absent) DEFAULT does not require a rewrite of > the table is because the DEFAULT expression (or NULL) is evaluated at the > time of the statement and the result is stored in the table's metadata. > > The following alterations of the table require that it be scanned in its > entirety to ensure that no existing values are contrary to the new > constraints placed on the table. Constraints backed by indexes will scan the > table as a side-effect of populating the new index with data. > > Adding a CHECK constraint requires scanning the table to verify that existing > rows meet the constraint. The same goes for adding a NOT NULL constraint to > an existing column. > > A newly attached partition requires scanning the table to verify that > existing rows meet the partition constraint. > > A foreign key constraint requires scanning the table to verify that all > existing values exist on the referenced table. > > The main reason for providing the option to specify multiple changes in a > single ALTER TABLE is that multiple table scans or rewrites can thereby be > combined into a single pass over the table. > > Scanning a large table to verify a new constraint can take a long time, and > other updates to the table are locked out until the ALTER TABLE ADD > CONSTRAINT command is committed. For CHECK and FOREIGN KEY constraints there > is an option, NOT VALID, that reduces the impact of adding a constraint on > concurrent updates. With NOT VALID, the ADD CONSTRAINT command does not scan > the table and can be committed immediately. After that, a VALIDATE CONSTRAINT > command can be issued to verify that existing rows satisfy the constraint. > The validation step does not need to lock out concurrent updates, since it > knows that other transactions will be enforcing the constraint for rows that > they insert or update; only pre-existing rows need to be checked. Hence, > validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being > altered. (If the constraint is a foreign key then a ROW SHARE lock is also > required on the table referenced by the constraint.) In addition to improving > concurrency, it can be useful to use NOT VALID and VALIDATE CONSTRAINT in > cases where the table is known to contain pre-existing violations. Once the > constraint is in place, no new violations can be inserted, and the existing > problems can be corrected at leisure until VALIDATE CONSTRAINT finally > succeeds. > > The DROP COLUMN form does not physically remove the column, but simply makes > it invisible to SQL operations. Subsequent insert and update operations in > the table will store a null value for the column. Thus, dropping a column is > quick but it will not immediately reduce the on-disk size of your table, as > the space occupied by the dropped column is not reclaimed. The space will be > reclaimed over time as existing rows are updated. > > To force immediate reclamation of space occupied by a dropped column, you can > execute one of the forms of ALTER TABLE that performs a rewrite of the whole > table. This results in reconstructing each row with the dropped column > replaced by a null value. > > The rewriting forms of ALTER TABLE are not MVCC-safe. After a table rewrite, > the table will appear empty to concurrent transactions, if they are using a > snapshot taken before the rewrite occurred. See Section 13.5 for more details. > [...] > """ > > I'm liking the idea of breaking out multiple features into their own > sentences or paragraphs instead of saying: > > "Adding a column with a volatile DEFAULT or changing the type of an existing > column" > > "Adding a CHECK or NOT NULL constraint" > > This later combination probably doesn't catch my attention except for this > discussion and the fact that there are multiple ways to add these constraints > and we might as well be clear about whether ALTER COLUMN or ADD COLUMN makes > a difference. On that note, the behavior implied by this wording is that > adding a check constraint even during ADD COLUMN will result in scanning the > table even when a table rewrite is not required. If that is the case at > present nothing actually says that - if one agrees that the exact same > sentence doesn't imply that a table scan is performed when adding a NOT NULL > constraint during ADD COLUMN (which doesn't happen). > That seems like enough material to extract out from the ALTER TABLE page and > stick elsewhere if one is so motivated. There may be other stuff too - but > the next paragraph covers some SET DATA TYPE nuances which seem like a > different dynamic. Coming back to this with fresh eyes in the morning and comparing your idea above to the existing doc page, and I really like this approach. I'll be marking this specific patch as withdrawn and opening a new patch for the restructuring. I also noticed an error in the existing docs (we no longer need to rebuild indexes when a table rewrite is skipped), and I'll be sending a separate patch to fix that separately. Thanks, James Coleman