[sqlite] Avoid duplicate sets with join table

Barry Wed, 20 Apr 2016 11:18:10 -0600

Would not the following snippet be useful to return the record ID if the
feature set already exists?


SELECT rec.id
FROM records rec
WHERE
1 = (SELECT SUM(id_f IN (21, 22, 23)) = 3 AND COUNT(*) = 3 FROM
records_features rf WHERE rf.id_r = rec.id)

You would have to have some logic in your calling language to deal with
things appropriately...

I don't know how this could be coded into a DB constraint since SQLite
lacks FOR EACH STATEMENT triggers, and to insert a superset of an existing
set you may at some point insert the existing set.

On 17 April 2016 at 17:32, James K. Lowden <jklowden at schemamania.org> wrote:

> On Sat, 16 Apr 2016 01:20:55 +0200
> Ketil Froyn <ketil at froyn.name> wrote:
>
> > I have two tables and a join table, in principle like this:
> >
> > CREATE TABLE records (id INTEGER PRIMARY KEY, data TEXT);
> > CREATE TABLE features (id INTEGER PRIMARY KEY, data TEXT UNIQUE);
> > CREATE TABLE records_features (id_r INTEGER, id_f INTEGER, ord
> > INTEGER);
>
> (As a style suggestion, consider not reduplicating plurals.  It's a
> one-man band, a 3-star general, and a 5-finger discount.  What you have
>  is a set of record_features.)
>
> > Later, if I come across a new record that has exactly features
> > "feature1", "feature2" and "feature3" (IDs 20,21 and 22), I don't want
> > to add a new record for this. So given a new set of feature IDs, how
> > can I best check if there's another record that has these exact
> > features before I insert it?
> >
> > To be clear, a record with features "feature1", "feature2", "feature4"
> > would be ok. So would "feature1", "feature2". Subsets are ok, but not
> > exact duplicates.
>
> This is an application of relational division: you want to know if two
> sets are equal.  Well, almost.  You really want to know if the "new" set
> is a subset of an existing one.
>
> You're also a little stuck for lack of syntax.  There are two sets in
> question: the extant one and the one to be inserted.  You can insert a
> set in SQLite,
>
>         insert into T values (a), (b), (c);
>
> but because that set of rows (a, b, and c) doesn't have a name, you
> can't refer to them again in the same query.  So you have to dance a
> little jig, something like:
>
>         begin transation
>         insert into record features ...
>         delete from record features
>         id = new_id
>         where exists (
>                 select 1 from record features as rf
>                 where rf.id_r = record_features.id_r
>                 and rf.id_f = record_features.id_f
>                 group by id
>                 having count(*) = (
>                         select count(*)
>                         from record_features where id_r = new_id
>                 )
>         );
>         commit transaction
>
> That puts the rows into the database -- where they can be examined --
> and deletes them if they turn out to be a duplicate.  You could also
> apply the same logic in an insert & update trigger, and probably should
> if the rule applies to the *data*, and not just the application's
> expectations.
>
> --jkl
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Avoid duplicate sets with join table

Reply via email to