On Mon, Aug 20, 2012 at 5:14 PM, Jeff Davis <pg...@j-davis.com> wrote: > On Mon, 2012-08-20 at 16:50 -0400, Robert Haas wrote: >> #3 for foreign tables. > > I'm skeptical of that approach for two reasons: > > (1) It will be hard to inform users which constraints are enforced and > which aren't.
The thing to keep in mind here is that EVERY property of a foreign table is subject to change at any arbitrary point in time, without our knowledge. The very existence of the table, the names of its columns, the types of those columns, and any additional constraints on that columns are all things that can vary between the declaration and what's actually present, and can be subsequently changed on the remote side at any time. Why should CHECK constraints be any different than, say, column types? > (2) It will be hard for users to understand the planner benefits or the > consequences when the constraint is not enforced. Why should that be any worse with foreign tables than anything else? I mean, lots of people, as things stand today, manage to set up partitioned tables using CHECK constraints. There are undoubtedly people who don't understand the planner benefit of having an appropriate CHECK constraint on each partition, but it's not exactly a common cause of confusion. > That being said, I can imagine good use cases (like when the foreign > table is in postgres, and already has that constraint declared), so I'm > not outright opposed to it. > >> #1 is not a reasonable alternative for foreign >> tables because we lack enforcement power in that case, > > Right. > >> and #2 is also >> not reasonable, because the only point of allowing declarative >> constraints is to get better performance, and if we go with #2 then >> we've pretty much thrown that out the window. > > Declared constraints can improve the plans, while runtime-enforced > constraints slow down execution of a given plan. I'm not really sure > whether runtime enforcement is a good trade-off, but it doesn't seem > like an obviously bad one. It seems to me that the term runtime enforcement is a misnomer; you can't really "enforce" anything about a foreign table. You can throw an error if the data doesn't meet expectations, but by that time the cat is already out of the bag. The purpose of a CHECK constraint on a plain table is to prevent bad data from going into a table, not to throw an error when retrieving previously-inserted bad data. If we were to propose changing the semantics from the former to the latter, we'd be laughed out of town, and rightly so. > Also, what did you mean by "the only point of allowing declarative > constraints is to get better performance"? Maybe the user wants to get > an error if some important assumption about the remote data source is > not as true as when they declared the constraint. I think that need is going to be best served by issuing a query to validate whatever constraint you're interested in, i.e. SELECT * FROM foreign_table WHERE NOT (whatever_i_am_assuming). I mean, let's suppose that we were to allow unique constraints on foreign tables. This is clearly useful, because, for example, it would allow join removal in a case like local_table LEFT JOIN foreign_table ON local_table.id = foreign_table.id; and it would also allow a selectivity estimate of -1 for that column. But are you going to validate that the column in question is still unique every time you access the table? Presumably not; you'd have to read and sort the entire table to do that. Now, if the query is something like "SELECT * FROM foreign_table WHERE id = 1", you could fairly cheaply validate that there is only one row with id = 1, but that's not the same thing as validating that the assumption (namely, that foreign_table (id) is unique) is still true. And if the query is "SELECT max(some_other_column) FROM foreign_table", you can't really validate anything at all, or at least not without a lot of extra overhead. Now, it's more feasible to think that you could validate a CHECK constraint on each row that the query actually touches, but that's still going to be pretty expensive, and it's still not the same thing as validating that the assumption holds true for all rows in the table. I think if we go down this road of trying to validate remote-side CHECK constraints, we're going to end up with a mishmash of cases where constraints are checked and other cases where constraints are not checked, and then that really is going to be confusing. We're definitely not going to be able to check everything all the time, so checking nothing ever seems like the principled alternative. Also, for tightly-coupled databases under common control, there's really no need to be constantly checking and rechecking that the other guy hasn't pulled the rug out from under you. We certainly need to have that as an option. For loosely-coupled databases under different administrative control there might be some point, but it seems sort of hopeless: if the other DBA doesn't care about breaking your stuff, he may happen to rename the table instead of dropping the check constraint, and then you're definitely hosed. People keep proposing the idea of being able to somehow validate remote check constraints, so maybe that is evidence that there is some use case for it, but I can't quite figure out what it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers