On Sun, May 31, 2009 at 3:04 PM, Bruce Momjian <br...@momjian.us> wrote: >> > I think this is basically a large-caliber foot gun. You're going to >> > pretend that invalid data is valid, until the user gets around to fixing >> > it? >> >> What choice do we have?
I think in this case the caliber is pretty small and this might be sufficient. It might be nice if we had a check somewhere in the tsvector data types so people get informative errors if their tsvectors are old-style rather than random incorrect results, but that's mostly gilding. In the general case of data type representation changes I think we need something like: While we can mark indexes as invalid (which we >> do), how do we mark a table's contents as invalid? Should we create >> rules so no one can see the data and then have the ALTER TABLE script >> remove the rules after it is rebuilt? > > OK, what ideas do people have to prevent access to tsvector columns? I > am thinking of renaming the tables or something. 1 Change the catalog so all the tsvector colums are bytea. 2 Include a c function like migrate_tsvector(bytea) which contains a copy of the old data type's output function and calls the new data type's input function on the result. 3 Include an ALTER TABLE command which calls the c function. The gotchas I can see with this is: 1) It only works for varlenas -- There isn't a universal fixed length data type. You would probably have to invent one. 2) I'm not sure what will happen to rules and triggers which call functions on the old data type. If you restore the schema unchanged and modify the catalog directly then they will still be there but have mismatched types. Will users get errors? Will those errors be sensible errors or nonsensical ones? Will the conversion still go ahead or will it complain that there are things which depend on the column? If the problems in (2) prove surmountable then this provides a general solution for any varlena data type representation change. However it will still be a O(n) conversion plus an index rebuild. That's unfortunate but unless we plan to ship the full set of operators, opclasses, opfamilies, cross-data-type operators, etc for the old data type I see no way around it. I haven't heard anyone suggest we should roll back the tsvector changes and give up the features the changes provide -- and that's just a performance feature. If that's all it took to convince us to give up in-place-upgrade for this data type then imagine how easy it will be to justify for actual functional features. (Personally I think we're fooling ourselves to think Postgres is mature enough that we won't come up with any new improvements which will justify a data format change. I would rather hope we'll keep coming up with massive improvements which require major changes in every release.) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers