I've been looking at the tsearch patch a bit, and I think there needs to
be more thought given to the permissions required to mess around with
tsearch configuration objects.
The TSParser objects reference functions declared to take and return
INTERNAL arguments. This means that the underlying functions must be
coded in C and can only be installed by a superuser, which in turn means
that there is no scenario where it is really useful for a non-superuser
to execute CREATE PARSER. What's more, allowing a non-superuser to do
it creates security holes: if you can find an unrelated function taking
the right number of INTERNAL arguments, you can install it as a TSParser
support function. That trivially allows crashing the backend, and it
could allow worse security holes than that.
TSDictionary objects have exactly the same issues since they also depend
on functions with INTERNAL arguments.
At minimum this means that we should restrict CREATE/DROP/ALTER commands
for these objects to superusers. (Which in turn means there's no point
in tracking an ownership column for them; every superuser is the same as
every other one, permissions-wise.) I'm wondering though whether this
doesn't mean that we don't need manipulation commands for them at all.
Is it likely that people will be adding parser or dictionary support to
an installation on the fly? Maybe we can just create 'em all at initdb
time and be done, similar to the way index access methods are treated.
This doesn't say that it's not possible to add more; you can add an
index access method on the fly too, if you want, by inserting stuff into
pg_am by hand. I'm just wondering whether all that SQL-statement
support and pg_dump support for custom parsers and dictionaries is
really worth the code space and future maintenance effort it'll eat up.
You could remove the immediate source of this objection if you could
redesign the APIs for the underlying support functions to be more
type-safe. I'm not sure how feasible or useful that would be though.
The bottom-line question here is whether developing a new parser or
dictionary implementation is really something that ordinary users might
do. If not, then having all this SQL-level support for setting up
catalog entries seems like wasted effort.
TSConfiguration objects are a different story, since they have only
type-safe dependencies on parsers, locales, and dictionaries. But they
still need some more thought about permissions, because AFAICS mucking
with a configuration can invalidate some other user's data. Do we want
to allow runtime changes in a configuration that existing tsvector
columns already depend on? How can we even recognize whether there is
stored data that will be affected by a configuration change? (AFAICS
the patch doesn't put anything into the pg_depend machinery that could
deal with this.) And who gets to decide which configuration is default,
I'm also a bit disturbed that you've made searches for TSConfiguration
objects be search-path-sensitive. That is likely to create problems
similar to those we've recently recognized for function lookup, eg,
an insertion into a full-text-indexed column gets treated differently
depending on the caller's search path. It's particularly bad to have
the default object be search-path-dependent. We learned the hard way
not to do that for default index operator classes; let's not make the
same mistake again for tsearch configurations.
Next, it took me a while to understand how Mapping objects fit into
the scheme at all, and now that (I think) I understand, I'm wondering
why treat them as an independent concept. Seems like the mapping from
token types to dictionaries is really a property of a configuration,
and we ought to be handling it through options of CREATE/ALTER
CONFIGURATION commands, not as an apparently independent object type.
The way the patch is doing it feels like implementing CREATE ATTRIBUTE
as a separate command instead of having ALTER TABLE ADD COLUMN; it's
just weird, and it's not obvious that dropping a configuration should
make the associated mapping object go away.
Lastly, I'm unhappy that the patch still keeps a lot of configuration
information, such as stop word lists, in the filesystem rather than the
database. It seems to me that the single easiest and most useful part
of a configuration to change is the stop word list; but this setup
guarantees that no one but a DBA can do that, and what's more that
pg_dump won't record your changes. What's the point of having any
non-superuser configuration capability at all, if stop words aren't part
of what you can change?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster