I have two tables and a join table, in principle like this: CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT); CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE); CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord INTEGER);
A record consists of one or more features, and a feature can exist in multiple records. The "ord INTEGER" is there to keep track of the order of the features. I want to avoid duplicate feature sets. We can assume that the ord values will match for two sets. So let's say there's a relationship stored in the DB: INSERT INTO records (data) VALUES("record1"); => 10 INSERT INTO features (data) VALUES("feature1"); => 20 INSERT INTO features (data) VALUES("feature2"); => 21 INSERT INTO features (data) VALUES("feature3"); => 22 INSERT INTO records_features (id_r, id_f, ord) VALUES(10,20,0); INSERT INTO records_features (id_r, id_f, ord) VALUES(10,21,1); INSERT INTO records_features (id_r, id_f, ord) VALUES(10,22,2); Later, if I come across a new record that has exactly features "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want to add a new record for this. So given a new set of feature IDs, how can I best check if there's another record that has these exact features before I insert it? To be clear, a record with features "feature1", "feature2", "feature4" would be ok. So would "feature1", "feature2". Subsets are ok, but not exact duplicates. One way could be to drop the join table and do a string concat of the feature IDs in a UNIQUE column in the record table, ie: CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT, feature_ids TEXT UNIQUE); INSERT INTO records (data, feature_ids) VALUES("record1", "20,21,22"); but I'd prefer to avoid manually having to parse the IDs out of a string. It does appear to be a hairy way to solve what I need though. A combination of the two would be better. The string "20,21,22" would function as a unique key avoiding duplicates, and if I also keep the join table I can still query the database without parsing strings. I won't be doing many updates, so I'm not really concerned about editing these strings. But I'm sure there's a better way. Do I need to use a CTE for this? -Ketil