Jeff - I'm pretty new to databases myself, but I have seen examples of schemas that resemble your first alternative, i.e. that involve the creation of a third table containing just the linkages between the primary and secondary tables.
And in fact, I think that is the right solution for your appli- cation. Although the link table duplicates the id1 column, it contains no wasted (NULL) entries. The second alternative (my proposal) doesn't duplicate id1, but 90% of the id2's are NULL. Since the primary table is very large, this represents an excessive waste of space. - Richard Jeff Gibson wrote: > Right. Hence my hesitation. :-) I suppose it's possible to check > uniqueness once at the end in O(N), but it would also take O(N) extra > storage, and I doubt sqlite is doing that... > > One thing your earlier suggestion brought up. The way I was hooking up > tables before was something along the lines of: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, <primary_fields>); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>); > CREATE TABLE link(id1 INTEGER, id2 INTEGER); > > My understanding of your suggestion is: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, > <primary_fields>); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>); > > with the understanding that id2 in primary will often be NULL. Are > there any circumstances where the first alternative is more > appropriate? I'm pretty new to databases, but I got my original idea > from a few schemas that I've seen. I'm just trying to understand the > trade-offs. > Thanks a lot for your help, > Jeff > > Richard Klein wrote: >> On second thought, checking the entire table for uniqueness would seem >> to require O(N log N), regardless of whether it is done one INSERT at >> a time, or all at once after the table is created! >> >> - Richard >> >> Richard Klein wrote: >> >>> Quoting from the description of CREATE TABLE in the SQL Syntax section of >>> the SQLite documentation: >>> >>> "The UNIQUE constraint causes an index to be created on the specified >>> columns. >>> This index must contain unique keys." >>> >>> The creation of an index would seem to imply an O(log N) search on each >>> insertion, so you should be okay. >>> >>> My advice would be to try it and see. If table creation takes too long, >>> you can always remove the UNIQUE constraint, and then write a routine to >>> check the table for uniqueness after it's created. >>> >>> - Richard >>> >>> >>> Jeff Gibson wrote: >>> >>>> I see. Fortunately my application simplifies this since the database is >>>> created once and read many times, but is never modified after creation >>>> time. Regarding constraints, I was thinking it might be helpful to add >>>> a few where applicable (whether foreign key constraints or even simple >>>> uniqueness constraints) basically as assertions, but I was worried about >>>> the overhead it would take to enforce them when I'm creating the >>>> database. Do you know if a uniqueness constraint, for instance, does an >>>> O(N) search on each insertion? If so, it sounds prohibitive. >>>> Thanks, >>>> Jeff >>>> >>>> Richard Klein wrote: >>>> >>>>> Jeff, >>>>> >>>>> I think that's the right way to go for your application. There are a few >>>>> things you should be aware of regarding this approach. >>>>> >>>>> A column in one table that references a column in another table is called >>>>> a "foreign key" in database lingo. >>>>> >>>>> An issue with foreign keys is that it is important to keep the referencing >>>>> table (big_table) in sync with the referenced table (secondary_table). >>>>> >>>>> For example, if you delete an entry from secondary_table, you want to >>>>> update >>>>> the foreign key column in all entries in big_table that reference that >>>>> entry. >>>>> >>>>> What's the proper way to update the foreign key? It depends on your >>>>> appli- >>>>> cation. You might want to set the foreign key in the referencing entries >>>>> to >>>>> NULL, or you might want to delete the referencing entries, or you might >>>>> want >>>>> to do something else. >>>>> >>>>> In standard, full-blown SQL, you can define the synchronization behavior >>>>> you >>>>> want with a "foreign key constraint". That is, you might create big_table >>>>> as follows: >>>>> >>>>> CREATE TABLE big_table ( >>>>> id INTEGER PRIMARY KEY, >>>>> col1 INTEGER, >>>>> col2 REAL, >>>>> col3 TEXT, >>>>> col4 BLOB, >>>>> col5 INTEGER, >>>>> CONSTRAINT col5_fk FOREIGN KEY(col5) >>>>> REFERENCES secondary_table(id) ON DELETE SET NULL, >>>>> ); >>>>> >>>>> This would define col5 as a foreign key referencing the id column of >>>>> secondary_ >>>>> table, and would specify that col5 should be set to NULL in all >>>>> referencing >>>>> entries in big_table when an entry in secondary_table is deleted. >>>>> >>>>> Unfortunately, SQLite does not implement foreign key constraints. More >>>>> precisely, >>>>> they don't cause syntax errors, but they aren't enforced. Therefore, you >>>>> will >>>>> have to implement the desired synchronization behavior yourself. >>>>> Fortunately, >>>>> this is easy to do with the use of TRIGGERs, which *are* implemented in >>>>> SQLite. >>>>> >>>>> Here are some links that might be useful: >>>>> >>>>> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key >>>>> SQLite triggers: http://www.sqlite.org/lang_createtrigger.html >>>>> >>>>> Hope this helps, >>>>> - Richard >>>>> >>>>> >>>>> >>>>>> Thanks! I'll give that a try. >>>>>> Jeff >>>>>> >>>>>> Richard Klein wrote: >>>>>> >>>>>> >>>>>>>> Whether or not the the secondary columns are needed is a function of >>>>>>>> one >>>>>>>> of the primary columns. That function involves values from another >>>>>>>> table, though, so the general case would require a join. That other >>>>>>>> table is small, however, so I generally cache it outside the database. >>>>>>>> >>>>>>>> Some pseudocode for my expected use would be something like: >>>>>>>> >>>>>>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >>>>>>>> while(step()) { >>>>>>>> >>>>>>>> if( F(primary_column_values) ) { >>>>>>>> Fetch secondary values >>>>>>>> } >>>>>>>> >>>>>>>> do something with primary and maybe secondary values; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Where F would be implemented outside the database. >>>>>>>> Thanks, >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I assume that the primary SELECT shown above can be made suitably fast >>>>>>> by creating the appropriate indices on big_table. >>>>>>> >>>>>>> If the secondary columns are kept in a separate, secondary_table, and >>>>>>> a fifth primary column is added that contains the ROWID of the approp- >>>>>>> riate entry in the secondary_table (or NULL if the secondary_table is >>>>>>> not needed), then the "Fetch secondary values" operation should be very >>>>>>> fast as well. >>>>>>> >>>>>>> It seems to me that this approach would be faster than a join, and >>>>>>> would consume less space than an 8-column table containing mostly >>>>>>> NULLs in the secondary columns. >>>>>>> >>>>>>> Of course, this approach would cost you some extra space, in the form >>>>>>> of the 5th primary column containing the secondary ROWID. >>>>>>> >>>>>>> - Richard Klein >>>>>>> >>>>>>> _______________________________________________ >>>>>>> sqlite-users mailing list >>>>>>> sqlite-users@sqlite.org >>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> sqlite-users mailing list >>>>>> sqlite-users@sqlite.org >>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> sqlite-users mailing list >>>>> sqlite-users@sqlite.org >>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>>> >>>>> >>>> _______________________________________________ >>>> sqlite-users mailing list >>>> sqlite-users@sqlite.org >>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>>> >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users