As a first-time poster may I be forgiven a very basic (yet somewhat long)
question?

I'm using SQLite with a relatively large (+100GB, 1.7 billion records and
growing) dataset. It's working wonderfully, in part because I've kept the
data separated on a natural key into about 250 individual databases, 3GB at
the biggest. This keeps things quite fast, but creates certain complexities.
To wit:

My main data has a compound primary key, one of whose attributes is an
externally defined and maintained code which can take several thousand
different values. There's also a unit attribute (which can take about 10
values.)

CREATE TABLE Code_Units    ("code", "unit", PRIMARY KEY ("code"));
CREATE TABLE Base_Data_1 ("partner", "code", "unit", "value", PRIMARY KEY
("partner", "code"));

Say that Base_Data_1 contains:
P1 | A | kg | 12.3
P2 | B | ml | 456
P3 | A | kg | 7.89

If this were the only base data table, I could simply do INSERT INTO
Code_Units SELECT DISTINCT code, unit from Base_Data_1; and be done with it.
Seems to work OK, although there's probably a better method.

The problem is that I've got 250 base tables which have to get poured into
this one, and  INSERT INTO with default conflict resolution will only work
once, since 'code' is the primary key. I can change the conflict resolution,
which suppresses the error, but say that Base_Data_2 (with the same schema
as _1) contains:

P6 | B | ml | 1.01
P7 | A | ml | 42.0

That P7 record is my nightmare. INSERT OR IGNORE and INSERT OR REPLACE will
lead to different Code_Units tables, but will silently pass over the
inconsistency, whereas I want a constraint violation.

I think this is what triggers are for. I can write:

CREATE TRIGGER Ensure_Consistent_Code_Unit_Mapping
BEFORE UPDATE OF unit ON Code_Units FOR EACH ROW
WHEN NEW.unit != OLD.unit
BEGIN SELECT RAISE(ABORT, "New data contains a code->unit mapping which is
inconsistent with previous data"); END;

and then, in my host application, bind SELECT DISTINCT results from
Base_Data_2 to UPDATE Code_Units SET unit = ? WHERE code = ? and execute
that. This works great until Base_Data_3 comes along, which contains a NEW
code->unit mapping:

P10 | C | au | 6.022

UPDATE, of course, doesn't put this new mapping into the table. There's no
UPDATE OR INSERT semantic that I can find, and since INSERT OR REPLACE is
deleting and then inserting, it can't logically fire an UPDATE trigger,
which (as I understand it) is the only way I could inspect both the old and
the new row.

The best I've come up with is to do an INSERT OR IGNORE first, to handle new
codes, and *then* do a parameter-bound UPDATE with the trigger above to
check up on old codes. This doesn't feel quite right, though. Two questions:

1. Is there a way of doing this in one clean statement?
2. Is there a way of avoiding parameter binding entirely and somehow using
SELECT DISTINCT within the UPDATE?
Bonus question 3. Am I thinking of this all wrong?

I could do this procedurally in my host language (Python) in three lines
using a hash table and an assertion, but since this is my first-ever attempt
at using SQL relationally (as opposed to as a convenient .CSV replacement)
I'm trying to be disciplined about it and use it as a learning experience,
rather than sticking with methods I already know. Also, this project is
going to be updated day-to-day by non-programmers, so I'd like the
constraints as deep inside the DB as possible.

Thank you all so much for your patience. I feel like this is in Chapter 1 of
all the database books I haven't read, and not in any of the ones I do have
(at least that I can find.) Maybe I'm being punished for splitting the base
table on part of its primary key?

-gc3
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to