I have two tables and a join table, in principle like this:

CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT);
CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE);
CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord INTEGER);

A record consists of one or more features, and a feature can exist in
multiple records. The "ord INTEGER" is there to keep track of the
order of the features. I want to avoid duplicate feature sets. We can
assume that the ord values will match for two sets.

So let's say there's a relationship stored in the DB:

INSERT INTO records (data) VALUES("record1");
=> 10
INSERT INTO features (data) VALUES("feature1");
=> 20
INSERT INTO features (data) VALUES("feature2");
=> 21
INSERT INTO features (data) VALUES("feature3");
=> 22
INSERT INTO records_features (id_r, id_f, ord) VALUES(10,20,0);
INSERT INTO records_features (id_r, id_f, ord) VALUES(10,21,1);
INSERT INTO records_features (id_r, id_f, ord) VALUES(10,22,2);

Later, if I come across a new record that has exactly features
"feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want
to add a new record for this. So given a new set of feature IDs, how
can I best check if there's another record that has these exact
features before I insert it?

To be clear, a record with features "feature1", "feature2", "feature4"
would be ok. So would "feature1", "feature2". Subsets are ok, but not
exact duplicates.

One way could be to drop the join table and do a string concat of the
feature IDs in a UNIQUE column in the record table, ie:

CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT, feature_ids
TEXT UNIQUE);

INSERT INTO records (data, feature_ids) VALUES("record1", "20,21,22");

but I'd prefer to avoid manually having to parse the IDs out of a
string. It does appear to be a hairy way to solve what I need though.

A combination of the two would be better. The string "20,21,22" would
function as a unique key avoiding duplicates, and if I also keep the
join table I can still query the database without parsing strings. I
won't be doing many updates, so I'm not really concerned about editing
these strings.

But I'm sure there's a better way. Do I need to use a CTE for this?

-Ketil

Reply via email to