Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

Greg Stark Sun, 31 May 2009 07:30:52 -0700

On Sun, May 31, 2009 at 3:04 PM, Bruce Momjian <[email protected]> wrote:
>> > I think this is basically a large-caliber foot gun.  You're going to
>> > pretend that invalid data is valid, until the user gets around to fixing
>> > it?
>>
>> What choice do we have?


I think in this case the caliber is pretty small and this might be
sufficient. It might be nice if we had a check somewhere in the
tsvector data types so people get informative errors if their
tsvectors are old-style rather than random incorrect results, but
that's mostly gilding.

In the general case of data type representation changes I think we
need something like:
  While we can mark indexes as invalid (which we
>> do), how do we mark a table's contents as invalid?  Should we create
>> rules so no one can see the data and then have the ALTER TABLE script
>> remove the rules after it is rebuilt?
>
> OK, what ideas do people have to prevent access to tsvector columns?  I
> am thinking of renaming the tables or something.

1 Change the catalog so all the tsvector colums are bytea.

2 Include a c function like migrate_tsvector(bytea) which contains a
copy of the old data type's output function and calls the new  data
type's input function on the result.

3 Include an ALTER TABLE command which calls the c function.

The gotchas I can see with this is:

1) It only works for varlenas -- There isn't a universal fixed length
data  type. You would probably have to invent one.

2) I'm not sure what will happen to rules and triggers which call
functions on the old data type. If you restore the schema unchanged
and modify the catalog directly then they will still be there but have
mismatched types. Will users get errors? Will those errors be sensible
errors or nonsensical ones? Will the conversion still go ahead or will
it complain that there are things which depend on the column?

If the problems in (2) prove surmountable then this provides a general
solution for any varlena data type representation change. However it
will still be a O(n) conversion plus an index rebuild. That's
unfortunate but unless we plan to ship the full set of operators,
opclasses, opfamilies, cross-data-type operators, etc for the old data
type I see no way around it.

I haven't heard anyone suggest we should roll back the tsvector
changes and give up the features the changes  provide -- and that's
just a performance feature. If that's all it took to convince us to
give up in-place-upgrade for this data type then imagine how easy it
will be to justify for actual functional features.

(Personally I think we're fooling ourselves to think Postgres is
mature enough that we won't come up with any new improvements which
will justify a data format change. I would rather hope we'll keep
coming up with massive improvements which require major changes in
every release.)

-- 
greg

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_migrator and an 8.3-compatible tsvector data type

Reply via email to