Hi all!

The Problem
-----------------
One case that traditional SQL doesn't handle very well is when you have a child 
entity which can be attached to a number of different parent entities. Examples 
might be comments, tags or file attachments - we might have 20 different 
entities in the system that we would like our users to be able add comments to, 
but the existing solutions for mapping this all have downsides.

Existing solution 1: Join tables ahoy
If I can have a list of comments on every other object in the system, and I 
want to have referrential integrity, then the obvious thing to do is create a 
join table between each entity and the comments table.
Pros:
 - Straight forward, traditional object-with-collection-of-child SQL structure
Cons:
 - If a parent object gets deleted here, we can't use foreign keys to delete 
e.g. a child comment, so we'll have to either explicitly do it as part of our 
delete logic or have a cleanup process to catch orphans. Or do a dance with 
delete triggers on the join tables deleting the comment.
 - For n entities requiring comments in the system, we need n join tables.If we 
want both comments and e.g. tags and likes on all of our entities, we now have 
3n join tables for what should be some relatively self-contained on-the-side 
data - this is could be more tables than the entire rest of the system
 - It's difficult to create any kind of self-contained component for building 
applications in this scenario, as it will need to know about every other entity 
in the system, or be able to poke around inside whatever ORM or data access 
system that you have to work out what join tables it needs when running queries.

Existing solution 2: Enter the matrix
Instead of having n join tables, let's just mash them all together, with a 
column per parent object, and a check constraint to force exactly one of those 
columns to be set.
Pros:
 - Less bloat in the number of tables
Cons:
 - Doesn't solve orphan problem
 - Addition of a new entity which needs comments and we now have to add another 
column onto it, potentially rewriting the whole thing
 - Ugly

Existing solution 3: Embed the matrix
Kinda like the dependency matrix table, except that all the columns referencing 
potential parent objects we put into the comment table instead.
Pros:
 - Everything contained in column table
 - No orphans, since cascaded deletes will now delete the actual comment
Cons:
 - Comment table now has references to every single type that it may be 
attached to
 - Addition of a new entity and we probably have to rewrite the comment table 
now

Existing solution 4: Abandon ye all referential integrity
Have a column indicating parent type and another one for the id. In the case of 
comments this would be directly on the comment table itself. In the case of 
something like tags that we might expect to be shared between entities, it 
would be in a single join table. 
Pros:
 - Pretty self-contained
 - Data model which has neither lots of empty columns or lots of tables
 - Can make new entities "commentable" without rewriting anything
 - Because it's self-contained, can build application components that don't 
need to know much about the rest of your system. For example this is the 
approach that the grails taggable and commentable plugins take.
Cons:
 - No referential integrity, since we can't have a single column pointing to 
different tables with existing foreign key infrastructure
 - Since there's no real db support for doing things this way, existing ORMs 
etc don't really know how use a single column to join against multiple 
different tables based on a discriminator or 'switch' column.

Existing solution 5: Everything's a thing
Make your entity hierarchy have a base level object which can have comments 
attached, and then everything that you need to be "commentable" has to extend 
that. You can do that in an ORM, or with table inheritance in the database.
Pros:
 - Single top-level thing to hang your data on
Cons:
 - You've polluted your object hierarchy just to hang some stuff off of the end 
of it rather than it being driven by behaviours
 - You're going to be paying a performance penalty - everything that extends 
that base level object will now join against it incessantly, and you now have a 
global id sequence or whatever that you may not want.

Basically none of the above handle the situation very well. The cleanest is 
solution 4, but lack of RI sucks.

Feature Proposal: Selective foreign keys.
-------------------------------------------------
Allow foreign keys to have where clauses. The above comment example using 
solution 4 might then look like then following:

CREATE TABLE comment as (
  id bigserial primary key,
  content text not null,
  parent_entity regclass not null,
  parent_id int8
);
ALTER TABLE comment ADD CONSTRAINT comment_blog_fk FOREIGN KEY (parent_id) 
REFERENCES blog(id) WHERE (parent_entity = ‘blog');
ALTER TABLE comment ADD CONSTRAINT comment_event_fk FOREIGN KEY (parent_id) 
REFERENCES event(id) WHERE (parent_entity = ‘event');

At this point, the following things should work:
insert into blog(id, title, content) values (10, 'i hate mondays', 'so sad');
insert into event(id, title, location, date) values (20, 'my birthday party', 
'local pub', now());
insert into comment(content, parent_entity, parent_id) values ('me too', 
'blog', 10);
insert into comment(content, parent_entity, parent_id) values ('I love that 
pub', 'event', 20);

But the following would barf:
insert into comment(content, parent_entity, parent_id) values ('bad comment', 
'blog', 20);
ERROR: insert or update on table "comment" violates foreign key constraint 
"comment_blog_fk"
  Detail: Key (parent_id)=(20) is not present in table "blog".: 

Patch
-----
I have attached a proof-of-concept patch for this feature.
What it currently does:
 - Allows where clause on FK declaration
 - Validates and stores the expression into pg_constraint a la check 
constraints, using the same validity checks
 - Implements logic on insertion to FK table, and update / delete on PK tables 
as expected.
 - Regression tests handling all of the above

Things I know are still to be done, if there is consensus that this feature is 
worth having:
 - Handle update of columns affecting the selection expression. Currently the 
RI trigger doesn’t fire if the FK column isn’t modified. I’m not sure where the 
logic for this is implemented.
 - pg_dump support
 - Update some comments in a few places referring to conbin/consrc columns of 
pg_constraint as relevant to check constraints only
 - Documentation
 - There are probably a few scenarios that I’m not testing in the regression 
tests that I should be.

Implementation Comments
----------------------------
I borrowed the exclusion constraint where clause in the grammar, which results 
in the where expression needing parentheses. With a bit more work this could 
possibly be removed, but I’m not too upset about it tbh.

Currently I’m creating a new executor for each call of RI_FKey_check when there 
is a where clause, and doing other bits of work in there. It’s not 100% clear 
to me how expensive doing that stuff in there is, and whether it’s worth it or 
possible to cache some of it.

Stuff added in tablecmds.c was copied from similar stuff elsewhere, but 
honestly without a huge amount of understanding. Don’t know if I could do 
without some of it. 


Comments?


Cheers

Tom



PS Thanks to Noah Misch who answered a question on the list a while ago which 
got me over a hump, and to the authors of the wiki pages around submitting a 
patch.

Attachment: selective-foreign-keys-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to