On Thu, Dec 12, 2019 at 2:45 AM Julien Rouhaud <rjuju...@gmail.com> wrote: > Hearing no objection in [1], attached v5 with following changes: > > - fix the spurious warnings by doing the version check in > get_relation_info and vacuum_open_relation, and add a flag in > RelationData to mark the check as already being done > - change the IsCatalogRelation() check to IsSystemRelation() to also > ignore toast indexes, as those can also be safely ignored too > - add a new ALTER INDEX idxname DEPENDS ON COLLATION collname CURRENT > VERSION to let users remove the warnings for safe library upgrade. > Documentation and tab completion added > > If I'm not mistaken, all issues I was aware of are now fixed.
Thanks! This is some great progress and I'm feeling positive about getting this into PostgreSQL 13. I haven't (re)reviewed the code yet, but I played with it a bit and have some more feedback. There are some missing semi-colons on the ALTER INDEX statements in pg_dump.c that make the pg_upgrade test fail (at least, if LC_ALL is set). We create duplicate pg_depend records: postgres=# create table t (x text); CREATE TABLE postgres=# create index on t(x) where x > 'hello'; CREATE INDEX postgres=# select * from pg_depend where objid = 't_x_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- 1259 | 16424 | 0 | 3456 | 100 | 0 | 0:34.0 | n 1259 | 16424 | 0 | 3456 | 100 | 0 | 0:34.0 | n (2 rows) I wondered if that was harmless, but for one thing it causes duplicate warnings: postgres=# update pg_depend set refobjversion = 'BANANA' where refobjversion = '0:34.0'; UPDATE 2 [new session] postgres=# select count(*) from t; WARNING: index "t_x_idx" depends on collation "default" version "BANANA", but the current version is "0:34.0" DETAIL: The index may be corrupted due to changes in sort order. HINT: REINDEX to avoid the risk of corruption. WARNING: index "t_x_idx" depends on collation "default" version "BANANA", but the current version is "0:34.0" DETAIL: The index may be corrupted due to changes in sort order. HINT: REINDEX to avoid the risk of corruption. Here's another way to get a duplicate, and in this example you also get an unnecessary dependency on 100 "default" for this index: postgres=# create index on t(x collate "fr_FR") where x > 'helicopter' collate "fr_FR"; CREATE INDEX postgres=# select * from pg_depend where objid = 't_x_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- 1259 | 16460 | 0 | 3456 | 12603 | 0 | 0:34.0 | n 1259 | 16460 | 0 | 3456 | 12603 | 0 | 0:34.0 | n 1259 | 16460 | 0 | 3456 | 100 | 0 | 0:34.0 | n (3 rows) Or... maybe 100 should be there, by simple analysis of the x in the WHERE clause, but it's the same if you write x collate "fr_FR" > 'helicopter' collate "fr_FR", and in that case there are no expressions of collation "default" anywhere. The indirection through composite types works nicely: postgres=# create type foo_t as (en text collate "en_CA", fr text collate "fr_CA"); CREATE TYPE postgres=# create table t (foo foo_t); CREATE TABLE postgres=# create index on t(foo); CREATE INDEX postgres=# select * from pg_depend where objid = 't_foo_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- 1259 | 16444 | 0 | 3456 | 12554 | 0 | 0:34.0 | n 1259 | 16444 | 0 | 3456 | 12597 | 0 | 0:34.0 | n (2 rows) ... but again it shows the extra and technically unnecessary dependencies (only 12603 "fr_FR" is really needed): postgres=# create index on t(((foo).fr collate "fr_FR")); CREATE INDEX postgres=# select * from pg_depend where objid = 't_fr_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- 1259 | 16445 | 0 | 3456 | 12603 | 0 | 0:34.0 | n 1259 | 16445 | 0 | 3456 | 12597 | 0 | 0:34.0 | n 1259 | 16445 | 0 | 3456 | 12554 | 0 | 0:34.0 | n (3 rows) I check that nested types are examined recursively, as appropriate. I also tested domains, arrays, arrays of domains, expressions extracting an element from an array of a domain with an explicit collation, and the only problem I could find was more ways to get duplicates. Hmm... what else is there that can contain a collatable type...? Ranges! postgres=# create type myrange as range (subtype = text); CREATE TYPE postgres=# drop table t; DROP TABLE postgres=# create table t (x myrange); CREATE TABLE postgres=# create index on t(x); CREATE INDEX postgres=# select * from pg_depend where objid = 't_x_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- (0 rows) ... or perhaps, more realistically, a GIST index might actually be useful for range queries, and we're not capturing the dependency: postgres=# create index t_x_idx on t using gist (x); CREATE INDEX postgres=# select * from pg_depend where objid = 't_x_idx'::regclass and refobjversion != ''; classid | objid | objsubid | refclassid | refobjid | refobjsubid | refobjversion | deptype ---------+-------+----------+------------+----------+-------------+---------------+--------- (0 rows) The new syntax "ALTER INDEX i_name DEPENDS ON ANY COLLATION UNKNOWN VERSION" doesn't sound good to me, it's not "ANY" collation, it's a specific set of collations that we aren't listing. "ALTER INDEX i_name DEPENDS ON COLLATION * VERSION UNKNOWN", hrmph, no that's terrible... I'm not sure what would be better. I'm not sure if I like the idea of VACUUM reporting warnings or not. Hmm. To state more explicitly what's happening here, we're searching the expression trees for subexpresions that have a collation as part of their static type. We don't know which functions or operators are actually affected by the collation, though. For example, if an expression says "x IS NOT NULL" and x happens to be a subexpression of a type with a particular collation, we don't now that this expression's value can't possibly be affected by the collation version changing. So, the system will nag you to rebuild an index just because you mentioned it, even though the index can't be corrupted. To do better than that, I suppose we'd need declarations in the catalog to say which functions/operators are collation sensitive. Then, as a special case, there is the collation of the actual indexed value, because that will implicitly be used as input to the btree ops that would be collation sensitive. That's just a thought experiment: it seems like massive overkill to try to catalog collation sensitivity for a rather limited benefit, and I'm happy with the way you have it. More soon.