Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

Vitaly Burovoy Thu, 06 Oct 2016 10:04:23 -0700

On 10/6/16, Serge Rielau <[email protected]> wrote:
>> On Oct 6, 2016, at 9:20 AM, Tom Lane <[email protected]> wrote:
>> Vitaly Burovoy <[email protected]> writes:
>>> But what I discover for myself is that we have pg_attrdef separately
>>> from the pg_attribute. Why?
>>
>> The core reason for that is that the default expression needs to be
>> a separate object from the column for purposes of dependency analysis.
>> For example, if you have a column whose default is "foo()", then the
>> default expression depends on the function foo(), but the column should
>> not: if you drop the function, only the default expression ought to
>> be dropped, not the column.
>>
>> Because of this, the default expression needs to have its own OID
>> (to be stored in pg_depend) and it's convenient to store it in a
>> separate catalog so that the classoid can identify it as being a
>> default expression rather than some other kind of object.
>
> Good to know.
>
>> If we were going to allow these missing_values or creation_defaults
>> or whatever they're called to be general expressions, then they would
>> need
>> to have their own OIDs for dependency purposes.  That would lead me to
>> think that the best representation is to put them in their own rows in
>> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
>> regular defaults from these things.  Or maybe they even need their own
>> catalog, depending on whether you think dependency analysis would want
>> to distinguish them from regular defaults using just the classed.
>>
>> Now, as I just pointed out in another mail, realistically we're probably
>> going to restrict the feature to simple constants, which'd mean they will
>> depend only on the column's type and can never need any dependencies of
>> their own.  So we could take the shortcut of just storing them in a new
>> column in pg_attribute.


I agree with you.

>> But maybe that's shortsighted and we'll
>> eventually wish we'd done them as full-fledged separate objects.

I don't think so. If we try to implement non-blocking adding columns
with volatile defaults (and for instance update old rows in the
background), we can end up with the next situation:

CREATE TABLE a(i bigint PRIMARY KEY);
INSERT INTO a SELECT generate_series(1,10000000000);

ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100);

For indexes (even unique) created concurrently similar troubles are
solved with a "not valid" mark, but what to do with a heap if we try
to do it in the background?

>> But on the third hand ... once one of these is in place, how could you
>> drop it separately from the column?  That would amount to a change in the
>> column's stored data, which is not what one would expect from dropping
>> a separate object.  So maybe it's senseless to think that these things
>> could ever be distinct objects.  But that definitely leads to the
>> conclusion that they're constants and nothing else.

> I cannot follow this reasoning.
> Let’s look past what PG does today:
> For each row (whether that’s necessary or not) we evaluate the expression,
> compute the value and
> store it in the rewritten table.
> We do not record dependencies on the “pedigree” of the value.
> It happened to originate from the DEFAULT expression provided with the ADD
> COLUMN,
> but that is not remembered anywhere.
> All we remember is the value - in each row.
> So the only change that is proposed here - when it comes right down to it -
> is to remember the value once only (IFF it is provably the same for each row)
> and thus avoid the need to rewrite the table.
> So I see no reason to impose any restriction other than “evaluated value is
> provably the same for every row”.

Tom says the same thing. The expression at the end should be a value
if it allows to avoid rewriting table.

> Regarding the location of storage.
> I did start of using pg_attrdef, but ran into some snags.
> My approach was to add the value as an extra column (rather than an extra
> row).
> That caused trouble since a SET DEFAULT operation is decomposed into a DROP
> and a SET and
> preserving the value across such operations did not come naturally.

I'm sorry for making you be confused. The best way is to use an extra
column in the pg_attribute to store serialized value.

> If we were to use extra rows instead that issue would be solved, assuming we
> add a “default kind” sort of column.
> It would dictate the storage format though which may be considered overkill
> for a a constant.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast AT ADD COLUMN with DEFAULTs

Reply via email to