Right. Hence my hesitation. :-) I suppose it's possible to check uniqueness once at the end in O(N), but it would also take O(N) extra storage, and I doubt sqlite is doing that...
One thing your earlier suggestion brought up. The way I was hooking up tables before was something along the lines of: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, <primary_fields>); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>); CREATE TABLE link(id1 INTEGER, id2 INTEGER); My understanding of your suggestion is: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, <primary_fields>); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>); with the understanding that id2 in primary will often be NULL. Are there any circumstances where the first alternative is more appropriate? I'm pretty new to databases, but I got my original idea from a few schemas that I've seen. I'm just trying to understand the trade-offs. Thanks a lot for your help, Jeff Richard Klein wrote: > On second thought, checking the entire table for uniqueness would seem > to require O(N log N), regardless of whether it is done one INSERT at > a time, or all at once after the table is created! > > - Richard > > Richard Klein wrote: > >> Quoting from the description of CREATE TABLE in the SQL Syntax section of >> the SQLite documentation: >> >> "The UNIQUE constraint causes an index to be created on the specified >> columns. >> This index must contain unique keys." >> >> The creation of an index would seem to imply an O(log N) search on each >> insertion, so you should be okay. >> >> My advice would be to try it and see. If table creation takes too long, >> you can always remove the UNIQUE constraint, and then write a routine to >> check the table for uniqueness after it's created. >> >> - Richard >> >> >> Jeff Gibson wrote: >> >>> I see. Fortunately my application simplifies this since the database is >>> created once and read many times, but is never modified after creation >>> time. Regarding constraints, I was thinking it might be helpful to add >>> a few where applicable (whether foreign key constraints or even simple >>> uniqueness constraints) basically as assertions, but I was worried about >>> the overhead it would take to enforce them when I'm creating the >>> database. Do you know if a uniqueness constraint, for instance, does an >>> O(N) search on each insertion? If so, it sounds prohibitive. >>> Thanks, >>> Jeff >>> >>> Richard Klein wrote: >>> >>>> Jeff, >>>> >>>> I think that's the right way to go for your application. There are a few >>>> things you should be aware of regarding this approach. >>>> >>>> A column in one table that references a column in another table is called >>>> a "foreign key" in database lingo. >>>> >>>> An issue with foreign keys is that it is important to keep the referencing >>>> table (big_table) in sync with the referenced table (secondary_table). >>>> >>>> For example, if you delete an entry from secondary_table, you want to >>>> update >>>> the foreign key column in all entries in big_table that reference that >>>> entry. >>>> >>>> What's the proper way to update the foreign key? It depends on your appli- >>>> cation. You might want to set the foreign key in the referencing entries >>>> to >>>> NULL, or you might want to delete the referencing entries, or you might >>>> want >>>> to do something else. >>>> >>>> In standard, full-blown SQL, you can define the synchronization behavior >>>> you >>>> want with a "foreign key constraint". That is, you might create big_table >>>> as follows: >>>> >>>> CREATE TABLE big_table ( >>>> id INTEGER PRIMARY KEY, >>>> col1 INTEGER, >>>> col2 REAL, >>>> col3 TEXT, >>>> col4 BLOB, >>>> col5 INTEGER, >>>> CONSTRAINT col5_fk FOREIGN KEY(col5) >>>> REFERENCES secondary_table(id) ON DELETE SET NULL, >>>> ); >>>> >>>> This would define col5 as a foreign key referencing the id column of >>>> secondary_ >>>> table, and would specify that col5 should be set to NULL in all referencing >>>> entries in big_table when an entry in secondary_table is deleted. >>>> >>>> Unfortunately, SQLite does not implement foreign key constraints. More >>>> precisely, >>>> they don't cause syntax errors, but they aren't enforced. Therefore, you >>>> will >>>> have to implement the desired synchronization behavior yourself. >>>> Fortunately, >>>> this is easy to do with the use of TRIGGERs, which *are* implemented in >>>> SQLite. >>>> >>>> Here are some links that might be useful: >>>> >>>> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key >>>> SQLite triggers: http://www.sqlite.org/lang_createtrigger.html >>>> >>>> Hope this helps, >>>> - Richard >>>> >>>> >>>> >>>>> Thanks! I'll give that a try. >>>>> Jeff >>>>> >>>>> Richard Klein wrote: >>>>> >>>>> >>>>>>> Whether or not the the secondary columns are needed is a function of >>>>>>> one >>>>>>> of the primary columns. That function involves values from another >>>>>>> table, though, so the general case would require a join. That other >>>>>>> table is small, however, so I generally cache it outside the database. >>>>>>> Some pseudocode for my expected use would be something like: >>>>>>> >>>>>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >>>>>>> while(step()) { >>>>>>> >>>>>>> if( F(primary_column_values) ) { >>>>>>> Fetch secondary values >>>>>>> } >>>>>>> >>>>>>> do something with primary and maybe secondary values; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> Where F would be implemented outside the database. >>>>>>> Thanks, >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>> I assume that the primary SELECT shown above can be made suitably fast >>>>>> by creating the appropriate indices on big_table. >>>>>> >>>>>> If the secondary columns are kept in a separate, secondary_table, and >>>>>> a fifth primary column is added that contains the ROWID of the approp- >>>>>> riate entry in the secondary_table (or NULL if the secondary_table is >>>>>> not needed), then the "Fetch secondary values" operation should be very >>>>>> fast as well. >>>>>> >>>>>> It seems to me that this approach would be faster than a join, and >>>>>> would consume less space than an 8-column table containing mostly >>>>>> NULLs in the secondary columns. >>>>>> >>>>>> Of course, this approach would cost you some extra space, in the form >>>>>> of the 5th primary column containing the secondary ROWID. >>>>>> >>>>>> - Richard Klein >>>>>> >>>>>> _______________________________________________ >>>>>> sqlite-users mailing list >>>>>> sqlite-users@sqlite.org >>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> sqlite-users mailing list >>>>> sqlite-users@sqlite.org >>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>> >>>>> >>>> _______________________________________________ >>>> sqlite-users mailing list >>>> sqlite-users@sqlite.org >>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>> >>>> >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users