On Sun, Mar 27, 2022 at 11:12 PM David G. Johnston
<david.g.johns...@gmail.com> wrote:
>
> On Sun, Mar 27, 2022 at 11:17 AM James Coleman <jtc...@gmail.com> wrote:
>>
>> Hmm,  I didn't realize that was project policy,
>
>
> Guideline/Rule of Thumb is probably a better concept.

Ah, OK, thanks.

>>
>> but I'm a bit
>> surprised given that the sentence which 0001 replaces seems like a
>> direct violation of that also: "In neither case is a rewrite of the
>> table required."
>>
>
> IMO mostly due to the structuring of the paragraphs; something like the 
> following makes it less problematic (and as shown below may be sufficient to 
> address the purpose of this patch)
>
> """
> [...]
> The following alterations of the table require the entire table, and/or its 
> indexes, to be rewritten; which may take a significant amount of time for a 
> large table, and will temporarily require as much as double the disk space.
>
> Changing the type of an existing column will require the entire table and its 
> indexes to be rewritten. As an exception, if the USING clause does not change 
> the column contents and the old type is either binary coercible to the new 
> type, or an unconstrained domain over the new type, a table rewrite is not 
> needed; but any indexes on the affected columns must still be rewritten.
>
> Adding a column with a volatile DEFAULT also requires the entire table and 
> its indexes to be rewritten.
>
> The reason a non-volatile (or absent) DEFAULT does not require a rewrite of 
> the table is because the DEFAULT expression (or NULL) is evaluated at the 
> time of the statement and the result is stored in the table's metadata.
>
> The following alterations of the table require that it be scanned in its 
> entirety to ensure that no existing values are contrary to the new 
> constraints placed on the table.  Constraints backed by indexes will scan the 
> table as a side-effect of populating the new index with data.
>
> Adding a CHECK constraint requires scanning the table to verify that existing 
> rows meet the constraint.  The same goes for adding a NOT NULL constraint to 
> an existing column.
>
> A newly attached partition requires scanning the table to verify that 
> existing rows meet the partition constraint.
>
> A foreign key constraint requires scanning the table to verify that all 
> existing values exist on the referenced table.
>
> The main reason for providing the option to specify multiple changes in a 
> single ALTER TABLE is that multiple table scans or rewrites can thereby be 
> combined into a single pass over the table.
>
> Scanning a large table to verify a new constraint can take a long time, and 
> other updates to the table are locked out until the ALTER TABLE ADD 
> CONSTRAINT command is committed. For CHECK and FOREIGN KEY constraints there 
> is an option, NOT VALID, that reduces the impact of adding a constraint on 
> concurrent updates. With NOT VALID, the ADD CONSTRAINT command does not scan 
> the table and can be committed immediately. After that, a VALIDATE CONSTRAINT 
> command can be issued to verify that existing rows satisfy the constraint. 
> The validation step does not need to lock out concurrent updates, since it 
> knows that other transactions will be enforcing the constraint for rows that 
> they insert or update; only pre-existing rows need to be checked. Hence, 
> validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being 
> altered. (If the constraint is a foreign key then a ROW SHARE lock is also 
> required on the table referenced by the constraint.) In addition to improving 
> concurrency, it can be useful to use NOT VALID and VALIDATE CONSTRAINT in 
> cases where the table is known to contain pre-existing violations. Once the 
> constraint is in place, no new violations can be inserted, and the existing 
> problems can be corrected at leisure until VALIDATE CONSTRAINT finally 
> succeeds.
>
> The DROP COLUMN form does not physically remove the column, but simply makes 
> it invisible to SQL operations. Subsequent insert and update operations in 
> the table will store a null value for the column. Thus, dropping a column is 
> quick but it will not immediately reduce the on-disk size of your table, as 
> the space occupied by the dropped column is not reclaimed. The space will be 
> reclaimed over time as existing rows are updated.
>
> To force immediate reclamation of space occupied by a dropped column, you can 
> execute one of the forms of ALTER TABLE that performs a rewrite of the whole 
> table. This results in reconstructing each row with the dropped column 
> replaced by a null value.
>
> The rewriting forms of ALTER TABLE are not MVCC-safe. After a table rewrite, 
> the table will appear empty to concurrent transactions, if they are using a 
> snapshot taken before the rewrite occurred. See Section 13.5 for more details.
> [...]
> """
>
> I'm liking the idea of breaking out multiple features into their own 
> sentences or paragraphs instead of saying:
>
> "Adding a column with a volatile DEFAULT or changing the type of an existing 
> column"
>
> "Adding a CHECK or NOT NULL constraint"
>
> This later combination probably doesn't catch my attention except for this 
> discussion and the fact that there are multiple ways to add these constraints 
> and we might as well be clear about whether ALTER COLUMN or ADD COLUMN makes 
> a difference.  On that note, the behavior implied by this wording is that 
> adding a check constraint even during ADD COLUMN will result in scanning the 
> table even when a table rewrite is not required. If that is the case at 
> present nothing actually says that - if one agrees that the exact same 
> sentence doesn't imply that a table scan is performed when adding a NOT NULL 
> constraint during ADD COLUMN (which doesn't happen).
> That seems like enough material to extract out from the ALTER TABLE page and 
> stick elsewhere if one is so motivated.  There may be other stuff too - but 
> the next paragraph covers some SET DATA TYPE nuances which seem like a 
> different dynamic.

Coming back to this with fresh eyes in the morning and comparing your
idea above to the existing doc page, and I really like this approach.
I'll be marking this specific patch as withdrawn and opening a new
patch for the restructuring.

I also noticed an error in the existing docs (we no longer need to
rebuild indexes when a table rewrite is skipped), and I'll be sending
a separate patch to fix that separately.

Thanks,
James Coleman


Reply via email to