Re: [sqlite] schema design question
On Fri, Apr 11, 2008 at 01:54:43PM -0700, Richard Klein scratched on the wall: > Jay A. Kreibich wrote: > >On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the > >wall: > > > >>My advice would be to try it and see. If table creation takes too long, > >>you can always remove the UNIQUE constraint, and then write a routine to > >>check the table for uniqueness after it's created. > > > > That "routine" could simply be creating an explicit unique index on > > the column after all the data is loaded. If the index can be > > created, you're good to go. > > > > This is still a slow process, but I know the creation of indexes on > > existing tables is one area the development team hopes to speed up. > > I'm not sure what the priority of that is, however. > > Wouldn't this take just as long as creating the index immediately after > creating the table, and then letting each INSERT update the index? There are savings to be found when you have the whole set of indexes available to you, just as you get some savings from batching up INSERTs. If you have a large enough page cache, you might be able to do a substantial part of the sort in one disk pass. If the sorter has the whole set avaliable, there's also the issue of how you deal with disk management and what sorting algorithm is used. Current thought is that there is room for improvement: http://www.mail-archive.com/sqlite-users@sqlite.org/msg31481.html Although that obviously doesn't help in the here and now. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Thanks for all the suggestions. My schema is now a lot cleaner, and my application runs 30% faster! Jeff Richard Klein wrote: >> Jeff Gibson wrote: >> >>> One thing your earlier suggestion brought up. The way I was hooking up >>> tables before was something along the lines of: >>> >>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); >>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >>> CREATE TABLE link(id1 INTEGER, id2 INTEGER); >>> >>> My understanding of your suggestion is: >>> >>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, >>> ); >>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >>> >>> with the understanding that id2 in primary will often be NULL. Are >>> there any circumstances where the first alternative is more >>> appropriate? I'm pretty new to databases, but I got my original idea >>> from a few schemas that I've seen. I'm just trying to understand the >>> trade-offs. >>> Thanks a lot for your help, >>> Jeff >>> >>> >> These different forms of linking the records are used for different >> types of relations. The two tables can have records that are related in >> a various combinations of one or many to one or many. >> >> one to one >> many to one >> one to many >> many to many >> >> Using a third table is required to implement a many to many relation. >> Each record in the third table stores one item of the relation (i.e >> which record in the first table is related to which record in the second >> table). >> >> A one to many relation is created by assigning an id to the record in >> the one side of the relation and referencing that id in a column on the >> many side of the relation. A many to one relation is the same a one to >> many relation, with the order of the tables reversed. This is what you >> have shown as Richard's suggestion. >> >> A one to one relation can be created by assigning an id to one record >> and using that same id as the primary key on the related record. >> >> For your case, you need a one to one relation between the primary and >> secondary tables. This can be done by using the same id for the related >> record in the secondary table as was assigned to the record in the >> primary table. >> >> CREATE TABLE primary(id INTEGER PRIMARY KEY, ); >> CREATE TABLE secondary(id INTEGER PRIMARY KEY, ); >> >> insert into primary values(null, ); >> insert into secondary values(last_insert_rowid(), ); >> >> When you want to retrieve the records you can use a join >> >> select * from primary join secondary using(id); >> >> or you can use a second select to retrieve the secondary fields using >> the id obtained from the primary field. >> >> select * from primary; >> if (has_secondary()) >> select * from secondary where id = primary.id; >> >> This does not waste any space storing unnecessary null fields. You >> should only resort the more complex foreign keys when you need to >> represent a more complex relation. >> >> HTH >> Dennis Cote >> > > As Dennis points out, I had assumed that the relationship between the > primary and secondary tables was many-to-one, i.e. that several entries > in the primary table could refer to the same entry in the secondary > table. > > If that is not the case -- if the relationship is in fact one-to-one -- > then Dennis's solution is the best one. > > I would use Dennis's two-SELECT approach rather than the join if speed > is an issue. > > - Richard > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Jay A. Kreibich wrote: > On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the wall: > >> My advice would be to try it and see. If table creation takes too long, >> you can always remove the UNIQUE constraint, and then write a routine to >> check the table for uniqueness after it's created. > > That "routine" could simply be creating an explicit unique index on > the column after all the data is loaded. If the index can be > created, you're good to go. > > This is still a slow process, but I know the creation of indexes on > existing tables is one area the development team hopes to speed up. > I'm not sure what the priority of that is, however. > >-j > Wouldn't this take just as long as creating the index immediately after creating the table, and then letting each INSERT update the index? Creating the index after all the data is loaded requires that the first row be inserted into the index Btree, then the second row be inserted into the Btree at the proper place, then the third row, and so on, re-balancing the Btree as necessary. Isn't this exactly the same process that takes place when you create the index first and then insert each row? - Richard ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
> Jeff Gibson wrote: >> One thing your earlier suggestion brought up. The way I was hooking up >> tables before was something along the lines of: >> >> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); >> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >> CREATE TABLE link(id1 INTEGER, id2 INTEGER); >> >> My understanding of your suggestion is: >> >> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, >> ); >> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >> >> with the understanding that id2 in primary will often be NULL. Are >> there any circumstances where the first alternative is more >> appropriate? I'm pretty new to databases, but I got my original idea >> from a few schemas that I've seen. I'm just trying to understand the >> trade-offs. >> Thanks a lot for your help, >> Jeff >> > > These different forms of linking the records are used for different > types of relations. The two tables can have records that are related in > a various combinations of one or many to one or many. > > one to one > many to one > one to many > many to many > > Using a third table is required to implement a many to many relation. > Each record in the third table stores one item of the relation (i.e > which record in the first table is related to which record in the second > table). > > A one to many relation is created by assigning an id to the record in > the one side of the relation and referencing that id in a column on the > many side of the relation. A many to one relation is the same a one to > many relation, with the order of the tables reversed. This is what you > have shown as Richard's suggestion. > > A one to one relation can be created by assigning an id to one record > and using that same id as the primary key on the related record. > > For your case, you need a one to one relation between the primary and > secondary tables. This can be done by using the same id for the related > record in the secondary table as was assigned to the record in the > primary table. > > CREATE TABLE primary(id INTEGER PRIMARY KEY, ); > CREATE TABLE secondary(id INTEGER PRIMARY KEY, ); > > insert into primary values(null, ); > insert into secondary values(last_insert_rowid(), ); > > When you want to retrieve the records you can use a join > > select * from primary join secondary using(id); > > or you can use a second select to retrieve the secondary fields using > the id obtained from the primary field. > > select * from primary; > if (has_secondary()) > select * from secondary where id = primary.id; > > This does not waste any space storing unnecessary null fields. You > should only resort the more complex foreign keys when you need to > represent a more complex relation. > > HTH > Dennis Cote As Dennis points out, I had assumed that the relationship between the primary and secondary tables was many-to-one, i.e. that several entries in the primary table could refer to the same entry in the secondary table. If that is not the case -- if the relationship is in fact one-to-one -- then Dennis's solution is the best one. I would use Dennis's two-SELECT approach rather than the join if speed is an issue. - Richard ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Jeff Gibson wrote: > > One thing your earlier suggestion brought up. The way I was hooking up > tables before was something along the lines of: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); > CREATE TABLE link(id1 INTEGER, id2 INTEGER); > > My understanding of your suggestion is: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, > ); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); > > with the understanding that id2 in primary will often be NULL. Are > there any circumstances where the first alternative is more > appropriate? I'm pretty new to databases, but I got my original idea > from a few schemas that I've seen. I'm just trying to understand the > trade-offs. > Thanks a lot for your help, > Jeff > These different forms of linking the records are used for different types of relations. The two tables can have records that are related in a various combinations of one or many to one or many. one to one many to one one to many many to many Using a third table is required to implement a many to many relation. Each record in the third table stores one item of the relation (i.e which record in the first table is related to which record in the second table). A one to many relation is created by assigning an id to the record in the one side of the relation and referencing that id in a column on the many side of the relation. A many to one relation is the same a one to many relation, with the order of the tables reversed. This is what you have shown as Richard's suggestion. A one to one relation can be created by assigning an id to one record and using that same id as the primary key on the related record. For your case, you need a one to one relation between the primary and secondary tables. This can be done by using the same id for the related record in the secondary table as was assigned to the record in the primary table. CREATE TABLE primary(id INTEGER PRIMARY KEY, ); CREATE TABLE secondary(id INTEGER PRIMARY KEY, ); insert into primary values(null, ); insert into secondary values(last_insert_rowid(), ); When you want to retrieve the records you can use a join select * from primary join secondary using(id); or you can use a second select to retrieve the secondary fields using the id obtained from the primary field. select * from primary; if (has_secondary()) select * from secondary where id = primary.id; This does not waste any space storing unnecessary null fields. You should only resort the more complex foreign keys when you need to represent a more complex relation. HTH Dennis Cote ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the wall: > My advice would be to try it and see. If table creation takes too long, > you can always remove the UNIQUE constraint, and then write a routine to > check the table for uniqueness after it's created. That "routine" could simply be creating an explicit unique index on the column after all the data is loaded. If the index can be created, you're good to go. This is still a slow process, but I know the creation of indexes on existing tables is one area the development team hopes to speed up. I'm not sure what the priority of that is, however. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Jeff - I'm pretty new to databases myself, but I have seen examples of schemas that resemble your first alternative, i.e. that involve the creation of a third table containing just the linkages between the primary and secondary tables. And in fact, I think that is the right solution for your appli- cation. Although the link table duplicates the id1 column, it contains no wasted (NULL) entries. The second alternative (my proposal) doesn't duplicate id1, but 90% of the id2's are NULL. Since the primary table is very large, this represents an excessive waste of space. - Richard Jeff Gibson wrote: > Right. Hence my hesitation. :-) I suppose it's possible to check > uniqueness once at the end in O(N), but it would also take O(N) extra > storage, and I doubt sqlite is doing that... > > One thing your earlier suggestion brought up. The way I was hooking up > tables before was something along the lines of: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); > CREATE TABLE link(id1 INTEGER, id2 INTEGER); > > My understanding of your suggestion is: > > CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, > ); > CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); > > with the understanding that id2 in primary will often be NULL. Are > there any circumstances where the first alternative is more > appropriate? I'm pretty new to databases, but I got my original idea > from a few schemas that I've seen. I'm just trying to understand the > trade-offs. > Thanks a lot for your help, > Jeff > > Richard Klein wrote: >> On second thought, checking the entire table for uniqueness would seem >> to require O(N log N), regardless of whether it is done one INSERT at >> a time, or all at once after the table is created! >> >> - Richard >> >> Richard Klein wrote: >> >>> Quoting from the description of CREATE TABLE in the SQL Syntax section of >>> the SQLite documentation: >>> >>> "The UNIQUE constraint causes an index to be created on the specified >>> columns. >>> This index must contain unique keys." >>> >>> The creation of an index would seem to imply an O(log N) search on each >>> insertion, so you should be okay. >>> >>> My advice would be to try it and see. If table creation takes too long, >>> you can always remove the UNIQUE constraint, and then write a routine to >>> check the table for uniqueness after it's created. >>> >>> - Richard >>> >>> >>> Jeff Gibson wrote: >>> I see. Fortunately my application simplifies this since the database is created once and read many times, but is never modified after creation time. Regarding constraints, I was thinking it might be helpful to add a few where applicable (whether foreign key constraints or even simple uniqueness constraints) basically as assertions, but I was worried about the overhead it would take to enforce them when I'm creating the database. Do you know if a uniqueness constraint, for instance, does an O(N) search on each insertion? If so, it sounds prohibitive. Thanks, Jeff Richard Klein wrote: > Jeff, > > I think that's the right way to go for your application. There are a few > things you should be aware of regarding this approach. > > A column in one table that references a column in another table is called > a "foreign key" in database lingo. > > An issue with foreign keys is that it is important to keep the referencing > table (big_table) in sync with the referenced table (secondary_table). > > For example, if you delete an entry from secondary_table, you want to > update > the foreign key column in all entries in big_table that reference that > entry. > > What's the proper way to update the foreign key? It depends on your > appli- > cation. You might want to set the foreign key in the referencing entries > to > NULL, or you might want to delete the referencing entries, or you might > want > to do something else. > > In standard, full-blown SQL, you can define the synchronization behavior > you > want with a "foreign key constraint". That is, you might create big_table > as follows: > > CREATE TABLE big_table ( > idINTEGER PRIMARY KEY, > col1 INTEGER, > col2 REAL, > col3 TEXT, > col4 BLOB, > col5 INTEGER, > CONSTRAINT col5_fk FOREIGN KEY(col5) >REFERENCES secondary_table(id) ON DELETE SET NULL, > ); > > This would define col5 as a foreign key referencing the id column of > secondary_ > table, and would specify that col5 should be set to NULL in all > referencing > entries in big_table when an entry in secondary_table is deleted. > > Unfortunately, SQLite does not implement foreign key constraints. More > precisely, > they
Re: [sqlite] schema design question
Right. Hence my hesitation. :-) I suppose it's possible to check uniqueness once at the end in O(N), but it would also take O(N) extra storage, and I doubt sqlite is doing that... One thing your earlier suggestion brought up. The way I was hooking up tables before was something along the lines of: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); CREATE TABLE link(id1 INTEGER, id2 INTEGER); My understanding of your suggestion is: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, ); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); with the understanding that id2 in primary will often be NULL. Are there any circumstances where the first alternative is more appropriate? I'm pretty new to databases, but I got my original idea from a few schemas that I've seen. I'm just trying to understand the trade-offs. Thanks a lot for your help, Jeff Richard Klein wrote: > On second thought, checking the entire table for uniqueness would seem > to require O(N log N), regardless of whether it is done one INSERT at > a time, or all at once after the table is created! > > - Richard > > Richard Klein wrote: > >> Quoting from the description of CREATE TABLE in the SQL Syntax section of >> the SQLite documentation: >> >> "The UNIQUE constraint causes an index to be created on the specified >> columns. >> This index must contain unique keys." >> >> The creation of an index would seem to imply an O(log N) search on each >> insertion, so you should be okay. >> >> My advice would be to try it and see. If table creation takes too long, >> you can always remove the UNIQUE constraint, and then write a routine to >> check the table for uniqueness after it's created. >> >> - Richard >> >> >> Jeff Gibson wrote: >> >>> I see. Fortunately my application simplifies this since the database is >>> created once and read many times, but is never modified after creation >>> time. Regarding constraints, I was thinking it might be helpful to add >>> a few where applicable (whether foreign key constraints or even simple >>> uniqueness constraints) basically as assertions, but I was worried about >>> the overhead it would take to enforce them when I'm creating the >>> database. Do you know if a uniqueness constraint, for instance, does an >>> O(N) search on each insertion? If so, it sounds prohibitive. >>> Thanks, >>> Jeff >>> >>> Richard Klein wrote: >>> Jeff, I think that's the right way to go for your application. There are a few things you should be aware of regarding this approach. A column in one table that references a column in another table is called a "foreign key" in database lingo. An issue with foreign keys is that it is important to keep the referencing table (big_table) in sync with the referenced table (secondary_table). For example, if you delete an entry from secondary_table, you want to update the foreign key column in all entries in big_table that reference that entry. What's the proper way to update the foreign key? It depends on your appli- cation. You might want to set the foreign key in the referencing entries to NULL, or you might want to delete the referencing entries, or you might want to do something else. In standard, full-blown SQL, you can define the synchronization behavior you want with a "foreign key constraint". That is, you might create big_table as follows: CREATE TABLE big_table ( idINTEGER PRIMARY KEY, col1 INTEGER, col2 REAL, col3 TEXT, col4 BLOB, col5 INTEGER, CONSTRAINT col5_fk FOREIGN KEY(col5) REFERENCES secondary_table(id) ON DELETE SET NULL, ); This would define col5 as a foreign key referencing the id column of secondary_ table, and would specify that col5 should be set to NULL in all referencing entries in big_table when an entry in secondary_table is deleted. Unfortunately, SQLite does not implement foreign key constraints. More precisely, they don't cause syntax errors, but they aren't enforced. Therefore, you will have to implement the desired synchronization behavior yourself. Fortunately, this is easy to do with the use of TRIGGERs, which *are* implemented in SQLite. Here are some links that might be useful: Foreign keys: http://en.wikipedia.org/wiki/Foreign_key SQLite triggers: http://www.sqlite.org/lang_createtrigger.html Hope this helps, - Richard > Thanks! I'll give that a try. >Jeff > > Richard Klein wrote: > > >>> Whether or not the the secondary columns are needed is a function of >>> one >>> of the primary columns. That
Re: [sqlite] schema design question
On second thought, checking the entire table for uniqueness would seem to require O(N log N), regardless of whether it is done one INSERT at a time, or all at once after the table is created! - Richard Richard Klein wrote: > Quoting from the description of CREATE TABLE in the SQL Syntax section of > the SQLite documentation: > > "The UNIQUE constraint causes an index to be created on the specified columns. > This index must contain unique keys." > > The creation of an index would seem to imply an O(log N) search on each > insertion, so you should be okay. > > My advice would be to try it and see. If table creation takes too long, > you can always remove the UNIQUE constraint, and then write a routine to > check the table for uniqueness after it's created. > > - Richard > > > Jeff Gibson wrote: >> I see. Fortunately my application simplifies this since the database is >> created once and read many times, but is never modified after creation >> time. Regarding constraints, I was thinking it might be helpful to add >> a few where applicable (whether foreign key constraints or even simple >> uniqueness constraints) basically as assertions, but I was worried about >> the overhead it would take to enforce them when I'm creating the >> database. Do you know if a uniqueness constraint, for instance, does an >> O(N) search on each insertion? If so, it sounds prohibitive. >> Thanks, >> Jeff >> >> Richard Klein wrote: >>> Jeff, >>> >>> I think that's the right way to go for your application. There are a few >>> things you should be aware of regarding this approach. >>> >>> A column in one table that references a column in another table is called >>> a "foreign key" in database lingo. >>> >>> An issue with foreign keys is that it is important to keep the referencing >>> table (big_table) in sync with the referenced table (secondary_table). >>> >>> For example, if you delete an entry from secondary_table, you want to update >>> the foreign key column in all entries in big_table that reference that >>> entry. >>> >>> What's the proper way to update the foreign key? It depends on your appli- >>> cation. You might want to set the foreign key in the referencing entries to >>> NULL, or you might want to delete the referencing entries, or you might want >>> to do something else. >>> >>> In standard, full-blown SQL, you can define the synchronization behavior you >>> want with a "foreign key constraint". That is, you might create big_table >>> as follows: >>> >>> CREATE TABLE big_table ( >>> idINTEGER PRIMARY KEY, >>> col1 INTEGER, >>> col2 REAL, >>> col3 TEXT, >>> col4 BLOB, >>> col5 INTEGER, >>> CONSTRAINT col5_fk FOREIGN KEY(col5) >>>REFERENCES secondary_table(id) ON DELETE SET NULL, >>> ); >>> >>> This would define col5 as a foreign key referencing the id column of >>> secondary_ >>> table, and would specify that col5 should be set to NULL in all referencing >>> entries in big_table when an entry in secondary_table is deleted. >>> >>> Unfortunately, SQLite does not implement foreign key constraints. More >>> precisely, >>> they don't cause syntax errors, but they aren't enforced. Therefore, you >>> will >>> have to implement the desired synchronization behavior yourself. >>> Fortunately, >>> this is easy to do with the use of TRIGGERs, which *are* implemented in >>> SQLite. >>> >>> Here are some links that might be useful: >>> >>> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key >>> SQLite triggers: http://www.sqlite.org/lang_createtrigger.html >>> >>> Hope this helps, >>> - Richard >>> >>> Thanks! I'll give that a try. Jeff Richard Klein wrote: >> Whether or not the the secondary columns are needed is a function of one >> of the primary columns. That function involves values from another >> table, though, so the general case would require a join. That other >> table is small, however, so I generally cache it outside the database. >> Some pseudocode for my expected use would be something like: >> >> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >> while(step()) { >> >>if( F(primary_column_values) ) { >> Fetch secondary values >> } >> >> do something with primary and maybe secondary values; >> >> } >> >> Where F would be implemented outside the database. >> Thanks, >> Jeff >> >> > I assume that the primary SELECT shown above can be made suitably fast > by creating the appropriate indices on big_table. > > If the secondary columns are kept in a separate, secondary_table, and > a fifth primary column is added that contains the ROWID of the approp- > riate entry in the secondary_table (or NULL if the secondary_table is > not needed), then the "Fetch secondary values" operation should be very > fast as well.
Re: [sqlite] schema design question
Quoting from the description of CREATE TABLE in the SQL Syntax section of the SQLite documentation: "The UNIQUE constraint causes an index to be created on the specified columns. This index must contain unique keys." The creation of an index would seem to imply an O(log N) search on each insertion, so you should be okay. My advice would be to try it and see. If table creation takes too long, you can always remove the UNIQUE constraint, and then write a routine to check the table for uniqueness after it's created. - Richard Jeff Gibson wrote: > I see. Fortunately my application simplifies this since the database is > created once and read many times, but is never modified after creation > time. Regarding constraints, I was thinking it might be helpful to add > a few where applicable (whether foreign key constraints or even simple > uniqueness constraints) basically as assertions, but I was worried about > the overhead it would take to enforce them when I'm creating the > database. Do you know if a uniqueness constraint, for instance, does an > O(N) search on each insertion? If so, it sounds prohibitive. > Thanks, > Jeff > > Richard Klein wrote: >> Jeff, >> >> I think that's the right way to go for your application. There are a few >> things you should be aware of regarding this approach. >> >> A column in one table that references a column in another table is called >> a "foreign key" in database lingo. >> >> An issue with foreign keys is that it is important to keep the referencing >> table (big_table) in sync with the referenced table (secondary_table). >> >> For example, if you delete an entry from secondary_table, you want to update >> the foreign key column in all entries in big_table that reference that entry. >> >> What's the proper way to update the foreign key? It depends on your appli- >> cation. You might want to set the foreign key in the referencing entries to >> NULL, or you might want to delete the referencing entries, or you might want >> to do something else. >> >> In standard, full-blown SQL, you can define the synchronization behavior you >> want with a "foreign key constraint". That is, you might create big_table >> as follows: >> >> CREATE TABLE big_table ( >> idINTEGER PRIMARY KEY, >> col1 INTEGER, >> col2 REAL, >> col3 TEXT, >> col4 BLOB, >> col5 INTEGER, >> CONSTRAINT col5_fk FOREIGN KEY(col5) >>REFERENCES secondary_table(id) ON DELETE SET NULL, >> ); >> >> This would define col5 as a foreign key referencing the id column of >> secondary_ >> table, and would specify that col5 should be set to NULL in all referencing >> entries in big_table when an entry in secondary_table is deleted. >> >> Unfortunately, SQLite does not implement foreign key constraints. More >> precisely, >> they don't cause syntax errors, but they aren't enforced. Therefore, you >> will >> have to implement the desired synchronization behavior yourself. >> Fortunately, >> this is easy to do with the use of TRIGGERs, which *are* implemented in >> SQLite. >> >> Here are some links that might be useful: >> >> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key >> SQLite triggers: http://www.sqlite.org/lang_createtrigger.html >> >> Hope this helps, >> - Richard >> >> >>> Thanks! I'll give that a try. >>>Jeff >>> >>> Richard Klein wrote: >>> > Whether or not the the secondary columns are needed is a function of one > of the primary columns. That function involves values from another > table, though, so the general case would require a join. That other > table is small, however, so I generally cache it outside the database. > Some pseudocode for my expected use would be something like: > > prepare("SELECT primary_columns FROM big_table WHERE some_criterion") > while(step()) { > >if( F(primary_column_values) ) { > Fetch secondary values > } > > do something with primary and maybe secondary values; > > } > > Where F would be implemented outside the database. > Thanks, > Jeff > > I assume that the primary SELECT shown above can be made suitably fast by creating the appropriate indices on big_table. If the secondary columns are kept in a separate, secondary_table, and a fifth primary column is added that contains the ROWID of the approp- riate entry in the secondary_table (or NULL if the secondary_table is not needed), then the "Fetch secondary values" operation should be very fast as well. It seems to me that this approach would be faster than a join, and would consume less space than an 8-column table containing mostly NULLs in the secondary columns. Of course, this approach would cost you some extra space, in the form of the 5th primary column containing the secondary ROWID. - Richard Klein
Re: [sqlite] schema design question
I see. Fortunately my application simplifies this since the database is created once and read many times, but is never modified after creation time. Regarding constraints, I was thinking it might be helpful to add a few where applicable (whether foreign key constraints or even simple uniqueness constraints) basically as assertions, but I was worried about the overhead it would take to enforce them when I'm creating the database. Do you know if a uniqueness constraint, for instance, does an O(N) search on each insertion? If so, it sounds prohibitive. Thanks, Jeff Richard Klein wrote: > Jeff, > > I think that's the right way to go for your application. There are a few > things you should be aware of regarding this approach. > > A column in one table that references a column in another table is called > a "foreign key" in database lingo. > > An issue with foreign keys is that it is important to keep the referencing > table (big_table) in sync with the referenced table (secondary_table). > > For example, if you delete an entry from secondary_table, you want to update > the foreign key column in all entries in big_table that reference that entry. > > What's the proper way to update the foreign key? It depends on your appli- > cation. You might want to set the foreign key in the referencing entries to > NULL, or you might want to delete the referencing entries, or you might want > to do something else. > > In standard, full-blown SQL, you can define the synchronization behavior you > want with a "foreign key constraint". That is, you might create big_table > as follows: > > CREATE TABLE big_table ( > idINTEGER PRIMARY KEY, > col1 INTEGER, > col2 REAL, > col3 TEXT, > col4 BLOB, > col5 INTEGER, > CONSTRAINT col5_fk FOREIGN KEY(col5) >REFERENCES secondary_table(id) ON DELETE SET NULL, > ); > > This would define col5 as a foreign key referencing the id column of > secondary_ > table, and would specify that col5 should be set to NULL in all referencing > entries in big_table when an entry in secondary_table is deleted. > > Unfortunately, SQLite does not implement foreign key constraints. More > precisely, > they don't cause syntax errors, but they aren't enforced. Therefore, you will > have to implement the desired synchronization behavior yourself. Fortunately, > this is easy to do with the use of TRIGGERs, which *are* implemented in > SQLite. > > Here are some links that might be useful: > > Foreign keys: http://en.wikipedia.org/wiki/Foreign_key > SQLite triggers: http://www.sqlite.org/lang_createtrigger.html > > Hope this helps, > - Richard > > >> Thanks! I'll give that a try. >>Jeff >> >> Richard Klein wrote: >> Whether or not the the secondary columns are needed is a function of one of the primary columns. That function involves values from another table, though, so the general case would require a join. That other table is small, however, so I generally cache it outside the database. Some pseudocode for my expected use would be something like: prepare("SELECT primary_columns FROM big_table WHERE some_criterion") while(step()) { if( F(primary_column_values) ) { Fetch secondary values } do something with primary and maybe secondary values; } Where F would be implemented outside the database. Thanks, Jeff >>> I assume that the primary SELECT shown above can be made suitably fast >>> by creating the appropriate indices on big_table. >>> >>> If the secondary columns are kept in a separate, secondary_table, and >>> a fifth primary column is added that contains the ROWID of the approp- >>> riate entry in the secondary_table (or NULL if the secondary_table is >>> not needed), then the "Fetch secondary values" operation should be very >>> fast as well. >>> >>> It seems to me that this approach would be faster than a join, and >>> would consume less space than an 8-column table containing mostly >>> NULLs in the secondary columns. >>> >>> Of course, this approach would cost you some extra space, in the form >>> of the 5th primary column containing the secondary ROWID. >>> >>> - Richard Klein >>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org
Re: [sqlite] schema design question
Jeff, I think that's the right way to go for your application. There are a few things you should be aware of regarding this approach. A column in one table that references a column in another table is called a "foreign key" in database lingo. An issue with foreign keys is that it is important to keep the referencing table (big_table) in sync with the referenced table (secondary_table). For example, if you delete an entry from secondary_table, you want to update the foreign key column in all entries in big_table that reference that entry. What's the proper way to update the foreign key? It depends on your appli- cation. You might want to set the foreign key in the referencing entries to NULL, or you might want to delete the referencing entries, or you might want to do something else. In standard, full-blown SQL, you can define the synchronization behavior you want with a "foreign key constraint". That is, you might create big_table as follows: CREATE TABLE big_table ( idINTEGER PRIMARY KEY, col1 INTEGER, col2 REAL, col3 TEXT, col4 BLOB, col5 INTEGER, CONSTRAINT col5_fk FOREIGN KEY(col5) REFERENCES secondary_table(id) ON DELETE SET NULL, ); This would define col5 as a foreign key referencing the id column of secondary_ table, and would specify that col5 should be set to NULL in all referencing entries in big_table when an entry in secondary_table is deleted. Unfortunately, SQLite does not implement foreign key constraints. More precisely, they don't cause syntax errors, but they aren't enforced. Therefore, you will have to implement the desired synchronization behavior yourself. Fortunately, this is easy to do with the use of TRIGGERs, which *are* implemented in SQLite. Here are some links that might be useful: Foreign keys: http://en.wikipedia.org/wiki/Foreign_key SQLite triggers: http://www.sqlite.org/lang_createtrigger.html Hope this helps, - Richard > Thanks! I'll give that a try. >Jeff > > Richard Klein wrote: >>> Whether or not the the secondary columns are needed is a function of one >>> of the primary columns. That function involves values from another >>> table, though, so the general case would require a join. That other >>> table is small, however, so I generally cache it outside the database. >>> Some pseudocode for my expected use would be something like: >>> >>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >>> while(step()) { >>> >>>if( F(primary_column_values) ) { >>> Fetch secondary values >>> } >>> >>> do something with primary and maybe secondary values; >>> >>> } >>> >>> Where F would be implemented outside the database. >>> Thanks, >>> Jeff >>> >> I assume that the primary SELECT shown above can be made suitably fast >> by creating the appropriate indices on big_table. >> >> If the secondary columns are kept in a separate, secondary_table, and >> a fifth primary column is added that contains the ROWID of the approp- >> riate entry in the secondary_table (or NULL if the secondary_table is >> not needed), then the "Fetch secondary values" operation should be very >> fast as well. >> >> It seems to me that this approach would be faster than a join, and >> would consume less space than an 8-column table containing mostly >> NULLs in the secondary columns. >> >> Of course, this approach would cost you some extra space, in the form >> of the 5th primary column containing the secondary ROWID. >> >> - Richard Klein >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Thanks! I'll give that a try. Jeff Richard Klein wrote: >> Whether or not the the secondary columns are needed is a function of one >> of the primary columns. That function involves values from another >> table, though, so the general case would require a join. That other >> table is small, however, so I generally cache it outside the database. >> Some pseudocode for my expected use would be something like: >> >> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >> while(step()) { >> >>if( F(primary_column_values) ) { >> Fetch secondary values >> } >> >> do something with primary and maybe secondary values; >> >> } >> >> Where F would be implemented outside the database. >> Thanks, >> Jeff >> > > I assume that the primary SELECT shown above can be made suitably fast > by creating the appropriate indices on big_table. > > If the secondary columns are kept in a separate, secondary_table, and > a fifth primary column is added that contains the ROWID of the approp- > riate entry in the secondary_table (or NULL if the secondary_table is > not needed), then the "Fetch secondary values" operation should be very > fast as well. > > It seems to me that this approach would be faster than a join, and > would consume less space than an 8-column table containing mostly > NULLs in the secondary columns. > > Of course, this approach would cost you some extra space, in the form > of the 5th primary column containing the secondary ROWID. > > - Richard Klein > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
> Whether or not the the secondary columns are needed is a function of one > of the primary columns. That function involves values from another > table, though, so the general case would require a join. That other > table is small, however, so I generally cache it outside the database. > Some pseudocode for my expected use would be something like: > > prepare("SELECT primary_columns FROM big_table WHERE some_criterion") > while(step()) { > >if( F(primary_column_values) ) { > Fetch secondary values > } > > do something with primary and maybe secondary values; > > } > > Where F would be implemented outside the database. > Thanks, > Jeff I assume that the primary SELECT shown above can be made suitably fast by creating the appropriate indices on big_table. If the secondary columns are kept in a separate, secondary_table, and a fifth primary column is added that contains the ROWID of the approp- riate entry in the secondary_table (or NULL if the secondary_table is not needed), then the "Fetch secondary values" operation should be very fast as well. It seems to me that this approach would be faster than a join, and would consume less space than an 8-column table containing mostly NULLs in the secondary columns. Of course, this approach would cost you some extra space, in the form of the 5th primary column containing the secondary ROWID. - Richard Klein ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Whether or not the the secondary columns are needed is a function of one of the primary columns. That function involves values from another table, though, so the general case would require a join. That other table is small, however, so I generally cache it outside the database. Some pseudocode for my expected use would be something like: prepare("SELECT primary_columns FROM big_table WHERE some_criterion") while(step()) { if( F(primary_column_values) ) { Fetch secondary values } do something with primary and maybe secondary values; } Where F would be implemented outside the database. Thanks, Jeff Richard Klein wrote: >> I'm pretty new to databases, and I have a schema design question. I >> don't know enough about the guts of how sqlite works to know how to make >> some tradeoffs. I have a large (potentially millions of entries) table >> and it has 4 columns which are needed for every entry, and 4 more that >> are needed for about 10% of the entries. I'm trying to decide whether I >> want one table with 8 columns with a bunch of NULLs or two tables with >> no NULLs that will require a join to get all of the 8 column values. I >> assume this is a space/performance tradeoff, since I would think >> searching one table would be a lot faster than doing a join, but I'm not >> sure what the impact would be in terms of disk/memory/performance of all >> those NULLs. >>Does anybody have any suggestions? >>Thanks, >>Jeff >> > > Can you give us a little more information? Specifically, is there any > way to tell, by looking at the 4 primary columns, that you are dealing > with one of the 10% entries that requires looking at the 4 secondary > columns? > > - RichardKlein > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
> I'm pretty new to databases, and I have a schema design question. I > don't know enough about the guts of how sqlite works to know how to make > some tradeoffs. I have a large (potentially millions of entries) table > and it has 4 columns which are needed for every entry, and 4 more that > are needed for about 10% of the entries. I'm trying to decide whether I > want one table with 8 columns with a bunch of NULLs or two tables with > no NULLs that will require a join to get all of the 8 column values. I > assume this is a space/performance tradeoff, since I would think > searching one table would be a lot faster than doing a join, but I'm not > sure what the impact would be in terms of disk/memory/performance of all > those NULLs. >Does anybody have any suggestions? >Thanks, >Jeff Can you give us a little more information? Specifically, is there any way to tell, by looking at the 4 primary columns, that you are dealing with one of the 10% entries that requires looking at the 4 secondary columns? - RichardKlein ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] schema design question
I'm pretty new to databases, and I have a schema design question. I don't know enough about the guts of how sqlite works to know how to make some tradeoffs. I have a large (potentially millions of entries) table and it has 4 columns which are needed for every entry, and 4 more that are needed for about 10% of the entries. I'm trying to decide whether I want one table with 8 columns with a bunch of NULLs or two tables with no NULLs that will require a join to get all of the 8 column values. I assume this is a space/performance tradeoff, since I would think searching one table would be a lot faster than doing a join, but I'm not sure what the impact would be in terms of disk/memory/performance of all those NULLs. Does anybody have any suggestions? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users