As a first-time poster may I be forgiven a very basic (yet somewhat long) question?
I'm using SQLite with a relatively large (+100GB, 1.7 billion records and growing) dataset. It's working wonderfully, in part because I've kept the data separated on a natural key into about 250 individual databases, 3GB at the biggest. This keeps things quite fast, but creates certain complexities. To wit: My main data has a compound primary key, one of whose attributes is an externally defined and maintained code which can take several thousand different values. There's also a unit attribute (which can take about 10 values.) CREATE TABLE Code_Units ("code", "unit", PRIMARY KEY ("code")); CREATE TABLE Base_Data_1 ("partner", "code", "unit", "value", PRIMARY KEY ("partner", "code")); Say that Base_Data_1 contains: P1 | A | kg | 12.3 P2 | B | ml | 456 P3 | A | kg | 7.89 If this were the only base data table, I could simply do INSERT INTO Code_Units SELECT DISTINCT code, unit from Base_Data_1; and be done with it. Seems to work OK, although there's probably a better method. The problem is that I've got 250 base tables which have to get poured into this one, and INSERT INTO with default conflict resolution will only work once, since 'code' is the primary key. I can change the conflict resolution, which suppresses the error, but say that Base_Data_2 (with the same schema as _1) contains: P6 | B | ml | 1.01 P7 | A | ml | 42.0 That P7 record is my nightmare. INSERT OR IGNORE and INSERT OR REPLACE will lead to different Code_Units tables, but will silently pass over the inconsistency, whereas I want a constraint violation. I think this is what triggers are for. I can write: CREATE TRIGGER Ensure_Consistent_Code_Unit_Mapping BEFORE UPDATE OF unit ON Code_Units FOR EACH ROW WHEN NEW.unit != OLD.unit BEGIN SELECT RAISE(ABORT, "New data contains a code->unit mapping which is inconsistent with previous data"); END; and then, in my host application, bind SELECT DISTINCT results from Base_Data_2 to UPDATE Code_Units SET unit = ? WHERE code = ? and execute that. This works great until Base_Data_3 comes along, which contains a NEW code->unit mapping: P10 | C | au | 6.022 UPDATE, of course, doesn't put this new mapping into the table. There's no UPDATE OR INSERT semantic that I can find, and since INSERT OR REPLACE is deleting and then inserting, it can't logically fire an UPDATE trigger, which (as I understand it) is the only way I could inspect both the old and the new row. The best I've come up with is to do an INSERT OR IGNORE first, to handle new codes, and *then* do a parameter-bound UPDATE with the trigger above to check up on old codes. This doesn't feel quite right, though. Two questions: 1. Is there a way of doing this in one clean statement? 2. Is there a way of avoiding parameter binding entirely and somehow using SELECT DISTINCT within the UPDATE? Bonus question 3. Am I thinking of this all wrong? I could do this procedurally in my host language (Python) in three lines using a hash table and an assertion, but since this is my first-ever attempt at using SQL relationally (as opposed to as a convenient .CSV replacement) I'm trying to be disciplined about it and use it as a learning experience, rather than sticking with methods I already know. Also, this project is going to be updated day-to-day by non-programmers, so I'd like the constraints as deep inside the DB as possible. Thank you all so much for your patience. I feel like this is in Chapter 1 of all the database books I haven't read, and not in any of the ones I do have (at least that I can find.) Maybe I'm being punished for splitting the base table on part of its primary key? -gc3 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users