On Tue, Mar 27, 2012 at 11:05 AM, Dimitri Fontaine <dimi...@2ndquadrant.fr> wrote: >> I agree that it's not a very helpful decision, but I'm not the one who >> said we wanted command triggers rather than event triggers. :-) > > Color me unconvinced about event triggers. That's not answering my use > cases.
Could we get a list of those use cases, maybe on a wiki page somewhere, and add to it all the use cases that KaiGai Kohei or others who are interested in this are aware of? Or maybe we can just discuss via email, but it's going to be hard to agree that we've got something that meets the requirements or doesn't if we're all imagining different sets of requirements. The main use cases I can think of are: 1. Replication. That is, after we perform a DDL operation, we call a trigger and tell it what we did, so that it can make a record of that information and ship it to another machine, where it will arrange to do the same thing on the remote side. 2. Redirection. That is, before we perform a DDL operation, we call a trigger and tell it what we've been asked to do, so that it can go execute the request elsewhere (e.g. the master rather than the Hot Standby slave). 3. Additional security or policy checks. That is, before we perform a DDL operation, we call a trigger and tell it what we've been asked to do, so that it can throw an error if it doesn't like our credentials or, say, the naming convention we've used for our column names. 4. Relaxation of security checks. That is, we let a trigger get control at permissions-checking time and let it make the go-or-no-go decision in lieu of the usual permissions-checking code. 5. Auditing. Either when something is attempted (i.e. before) or after it happens, we log the attempt/event somewhere. Anything else? In that list of use cases, it seems to me that you want BEFORE and AFTER triggers to have somewhat different firing points. For the BEFORE cases, you really need the command trigger to fire early, and once. For example, if someone says "ALTER TABLE dimitri ADD COLUMN fontaine INTEGER, DROP COLUMN IF EXISTS haas", permissions on dimitri should really only get checked once, not once for each subcommand. That's the point at which you need to get control for #3 or #4, and it would be workable for #5 as well; I'm less sure about #2. On the other hand, for the AFTER cases I've listed here, I think you really want to know what *actually* happened, not what somebody thought about doing. You want to know which tables, sequences, etc. *actually* got created or dropped, not the ones that the user mentioned. If the user mentioned a table but we didn't drop it (and we also didn't error out, because IF EXISTS) is used, none of the AFTER cases really care; if we dropped other stuff (because of CASCADE) the AFTER cases may very well care. Another thing to think about with respect to deciding on the correct firing points is that you can't fire easily the trigger after we've identified the object in question but before we've checked permissions on it, which otherwise seems like an awfully desirable thing to do for use cases 3, 4, and 5 from the above list, and maybe 2 as well. We don't want to try to take locks on objects that the current user doesn't have permission to access, because then a user with no permissions whatsoever on an object can interfere with access by authorized users. On the flip side, we can't reliably check permissions before we've locked the object, because somebody else might rename or drop it after we check permissions and before we get the lock. Noah Misch invented a clever technique that I then used to fix a bunch of these problems in 9.2: the fixed code (sadly, not all cases are fixed yet, due to the fact that we ran out of time in the development cycle) looks up the object name, checks permissions (erroring out if the check fails), and then locks the object. Once it gets the lock, it checks whether any shared-invalidation messages have been processed since the point just before we looked up the object name. If so, it redoes the name lookup. If the referrent of the name has not changed, we're done; if it has, we drop the old lock and relock the new object and loop around again, not being content until we're sure that the object we locked is still the referrant of the name. This leads to much more logical behavior than the old way of doing things, and not incidentally gets rid of a lot of errors of the form "cache lookup failed for relation %u" that users of existing releases will remember, probably not too fondly. However, it's got serious implications for triggers that want to relax security policy, because the scope of what you can do inside that loop is pretty limited. You can't really do anything to the relation while you're checking permissions on it, because you haven't locked it yet. If you injected a trigger there, it would have to be similarly limited, and I don't know how we'd enforce that, and it would have to be prepared to get called multiple times for the same operation to handle the retry logic, which would be awkward for other cases, like auditing. For cases where people just want to impose extra security or policy checks, we could possibly just punt and do them after the lock is taken, as Dimitri's patch does. That has the disadvantage that you can possibly obtain a lock on an object that you don't have permissions on, but maybe that's tolerable. To understand the issue here, think about LOCK TABLE foo. If the table isn't otherwise locked and you lack permissions, this will fail immediately. But in 9.1, it will try to AccessExclusiveLock the table first and then check permissions. As soon as it gets the lock, it will fail, but if other people are using the table, those will block the AccessExclusiveLock, but the pending AccessExclusiveLock will also block any new locks from being granted. So essentially all new operations against the table get blocked until all the existing transactions that have touched that table commit or roll back, even though the guy doing the LOCK TABLE has no privileges. Bummer. However, I think we could possibly live with some limitations of this kind for *additional* checks imposed by command triggers; we'll still be protected against someone getting a table lock when they have no privileges at all. It's a lot more awkward when someone wants to *relax* privilege checks, though - I'm fairly confident that we *don't* want to go back to the pre-9.2 behavior of locking first and asking questions later. > Try to build a command string from the catalogs… even if you can store a > snapshot of them before and after the command. Remember that you might > want to “replicate” to things that are NOT a PostgreSQL server. For CREATE commands, I am sure this is doable, although there also other ways. For ALTER commands, I agree that you need to pass detailed information about what is being altered and how. But that drags you well way from the idea of a command trigger. At a minimum, you're going to want to know about each subcommand of ALTER TABLE. However, I think it's unrealistic to suppose we're going to get all of that straightened out in time for 9.2. KaiGai took a whole release cycle to get working support for just CREATE and DROP. >> ambiguity. If you say that we're going to have a trigger on the >> CREATE SEQUENCE command, then what happens when the user creates a >> sequence via some other method? The current patch says that we should >> handle that by calling the CREATE SEQUENCE trigger if it happens to be >> convenient because we're going through the same code path that a >> normal CREATE SEQUENCE would have gone through, but if it uses a >> different code path then let's not bother. Otherwise, how do you > > Yes, the current set of which commands fire which triggers is explained > by how the code is written wrt standard_ProcessUtility() calls. We could > mark re-entrant calls and disable the command trigger feature, it would > not be our first backend global variable in flight. Agreed, although I think just adding an argument to ProcessUtility would be enough. >> Dimitri is not the first or last person to want to get control during >> DDL operations, and KaiGai's already done a lot of work figuring out >> how to make it work reasonably. Pre-create hooks don't exist in that >> machinery not because nobody wants them, but because it's hard. This > > I've been talking with Kaigai about using the Command Trigger > infrastructure to implement its control hooks, while reviewing one of > his patches, and he said that's not low-level enough for him. Right. That worries me. The infrastructure KaiGai is using is low-level in two senses. First, it requires you to write a C module, load it into the backend, and use _PG_init to frob global variables. That is, it's not user-friendly; you have to do very low-level things to get access to the feature. Second, it's fine-grained. A single command can potentially hit those hooks many times; each individual operation triggers a hook call, not the command as a whole. When KaiGai says that command triggers wouldn't work for his purposes, he's talking about the second issue, not the first one. It might be less efficient for him to define his hooks as C-callable functions, register them in pg_proc, and install them using CREATE COMMAND TRIGGER, but it would work. No problem. However, the firing points you've chosen would not be useful for his purposes. I'm beating a dead horse here, but I think that's because you've got the wrong firing points, not because his needs are somehow very different from everything that anyone else wants to do. > Sweating over that feature is a good summary of a whole lot of my and > some others' time lately. Understood. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers