Re: [sqlite] Index and ORDER BY
Yes, this has been my experience as well. I've tried 3.5.6 and 3.5.9. Jeff Alexey Pechnikov wrote: > В сообщении от Tuesday 01 July 2008 23:47:50 [EMAIL PROTECTED] написал(а): > >> On Tue, 1 Jul 2008, Alexey Pechnikov wrote: >> >>> Is any difference between "CREATE INDEX ev_idx ON events(type,eid)" >>> and "CREATE INDEX ev_idx ON events(type,eid desc)"? What is "desc" >>> keyword for index? >>> >> The DESC keyword creates the index in descending collation order, rather >> than ascending order (default). I believe this sort order may not be >> observed in older versions, but more recent ones do so. >> > > I'm using SQLite 3.5.9 and there are no differents in my tests between DESC > and default indeces. I try create index with keywork DESC for optimize DESC > sorting but it don't work for me. My tests you can see above. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Index and ORDER BY
I see. It turns out that the selectivity of "type" is highly variable - some types are very common and some are quite rare. What made me curious is that when I have an index on type and I look for the first few entries in ascending order, the query is very fast - it seems that it does the indexed search on type and then starts searching the matching rows in ascending order. For the descending order, it seems that it has to find all matching rows and then return the last one (I don't know for sure how it's working, but that seems to fit the performance measurements). Is there any way to have it use the index on type and then search the matching rows in descending order so no sorting is required? My confusion is that it seems to search in descending order when only the primary key is involved, but not when using an index, even if that index has DESC specified. As you say, I have put in the "+type" trick, and that speeds up my common case (where a matching type is nearby so the linear search isn't so bad), so now I'm in the much better situation of just worrying about the hypothetical case where it has to search a long way to find a matching type. Thanks, Jeff D. Richard Hipp wrote: > On Jul 1, 2008, at 2:17 PM, Jeff Gibson wrote: > > >> I'm including a copy of Alexey's relevant message below. Unless I >> misunderstand, he has a test case that demonstrates that for the >> table: >> >> CREATE TABLE events (eid INTEGER PRIMARY KEY,type INTEGER) >> >> the query: >> >> SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY >> eid DESC LIMIT 1; >> >> runs much faster if there is no index on type. The culprit seems to >> be >> the <= in conjunction with the descending ordering. If you change >> either, the query gets much faster. He tried the using indices >> events(type,eid) and events(type,eid desc). I also tried your >> original >> suggestion of just events(type) and got the same result. >> > > That would be a case of SQLite choosing a suboptimal index, which is > very different from ignoring an index all together, which is what your > original statement said. I see that if there is an index on > events(type) that index is used rather than the primary key. This is > because the query optimizer is assuming that type=22 is highly > selective. Running ANALYZE might help. But a sure-fire solution is > to change the query as follows: > > SELECT * FROM events > WHERE eid<=32619750 >AND +type=22 > ORDER BY eid DESC > LIMIT 1; > > Note the "+" operator in front of the "type" field in the WHERE > clause. This + size makes that term of the WHERE clause an > expression, rather than a constraint on a column, and this > disqualifies it from use by an index. That forces SQLite to use the > other query strategy, which is to use the integer primary key. > > Note that in this case, the correct index choice depends on the kind > of data contained in the table. If there is only a single row out of > 20 million for which type=22, but there are hundreds of thousands of > rows with eid<=32619750, then the use of the index on event(type) is > clearly the better strategy. Only when type=22 is a common occurrence > does it become better to use the integer primary key. SQLite does not > attempt to keep statistics on table contents, so it has no way of > knowing which approach is really better. It makes its best guess. In > this case, it happened to guess wrong. But, as I pointed out, a > programmer with higher-level knowledge of the table content can steer > SQLite toward the better choice with the judicious use of a "+" symbol. > > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Index and ORDER BY
I'm including a copy of Alexey's relevant message below. Unless I misunderstand, he has a test case that demonstrates that for the table: CREATE TABLE events (eid INTEGER PRIMARY KEY,type INTEGER) the query: SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY eid DESC LIMIT 1; runs much faster if there is no index on type. The culprit seems to be the <= in conjunction with the descending ordering. If you change either, the query gets much faster. He tried the using indices events(type,eid) and events(type,eid desc). I also tried your original suggestion of just events(type) and got the same result. Thanks, Jeff D. Richard Hipp wrote: > On Jul 1, 2008, at 1:24 PM, [EMAIL PROTECTED] wrote: > >> Is it a problem in sqlite that it will only optimize: "WHERE >> primary_key<=X ORDER BY primary_key DESC" if it's not using an index? >> Is it supposed to? >> > > It would be a problem if it where the case. But in every test I have > tried, SQLite does in fact use an index on WHERE pk<=X ORDER BY pk > DESC. If you can demonstrate a case where it does not, we will fix it. > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > Alexey Pechnikov wrote: > Really, there is problem with multi-column indexes. You must use only primary > key index for ">=" where clause and "ASC" sorting and "<=" where clause and > DESC sorting. > > > 1. I try with primary key: > > #!/usr/bin/tclsh > package require sqlite3 > sqlite3 db index_order.db > db eval {DROP TABLE IF EXISTS events} > db eval {CREATE TABLE events (eid INTEGER PRIMARY KEY,type INTEGER)} > db transaction { > for {set i 0} {$i<1} {incr i} { > set type [expr {$i % 50}] > db eval {insert into events values ($i,$type)} > } > } > db close > > So, "type" is equal ("eid" mod 50). > > sqlite> SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY > eid DESC LIMIT 1; > 32619722|22 > CPU Time: user 0.00 sys 0.00 > > sqlite> EXPLAIN QUERY PLAN SELECT events.* FROM events WHERE eid<=32619760 > and > type=22 ORDER BY eid DESC LIMIT 1; > 0|0|TABLE events USING PRIMARY KEY ORDER BY > > > Result: this index is good. > > > 2. And I try with two-columns common order index: > #!/usr/bin/tclsh > package require sqlite3 > sqlite3 db index_order.db > db eval {DROP TABLE IF EXISTS events} > db eval {CREATE TABLE events (eid INTEGER PRIMARY KEY,type INTEGER)} > db transaction { > for {set i 0} {$i<1} {incr i} { > set type [expr {$i % 50}] > db eval {insert into events values ($i,$type)} > } > } > db eval {CREATE INDEX ev_idx ON events(type,eid)} > db close > > sqlite> SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY > eid DESC LIMIT 1; > 32619722|22 > CPU Time: user 1.400088 sys 1.696106 > > sqlite> EXPLAIN QUERY PLAN SELECT events.* FROM events WHERE eid<=32619760 > and > type=22 ORDER BY eid DESC LIMIT 1; > 0|0|TABLE events WITH INDEX ev_idx ORDER BY > > > Result: this index is bad. > > > 3. And I try with two-columns desc order index: > #!/usr/bin/tclsh > package require sqlite3 > sqlite3 db index_order.db > db eval {DROP TABLE IF EXISTS events} > db eval {CREATE TABLE events (eid INTEGER PRIMARY KEY,type INTEGER)} > db transaction { > for {set i 0} {$i<1} {incr i} { > set type [expr {$i % 50}] > db eval {insert into events values ($i,$type)} > } > } > db eval {CREATE INDEX ev_desc_idx ON events(type asc,eid desc)} > db close > > sqlite> SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY > eid DESC LIMIT 1; > 32619722|22 > CPU Time: user 0.600037 sys 0.608038 > > sqlite> EXPLAIN QUERY PLAN SELECT events.* FROM events WHERE eid<=32619760 > and > type=22 ORDER BY eid DESC LIMIT 1; > 0|0|TABLE events WITH INDEX ev_desc_idx ORDER BY > > > And with modified query: > > sqlite> SELECT events.* FROM events WHERE eid>=32619760 and type=22 ORDER BY > eid DESC LIMIT 1; > 9972|22 > CPU Time: user 0.00 sys 0.00 > sqlite> SELECT events.* FROM events WHERE eid<=32619760 and type=22 ORDER BY > eid ASC LIMIT 1; > 22|22 > CPU Time: user 0.00 sys 0.004000 > sqlite> SELECT events.* FROM events WHERE eid>=32619760 and type=22 ORDER BY > eid ASC LIMIT 1; > 32619772|22 > CPU Time: user 0.284018 sys 0.820051 > > > > > Result: this index is bad. > > > > P.S. Try with primary key index only and write your results. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Index and ORDER BY
When I try a similar query (i.e, no type comparison), I get the same results as you: sqlite> SELECT eid,type FROM EVENTS WHERE eid<=3261976 ORDER BY eid DESC LIMIT 1; 3261976|21 CPU Time: user 0.00 sys 0.027996 sqlite> EXPLAIN QUERY PLAN SELECT eid,type FROM EVENTS WHERE eid<=3261976 ORDER BY eid DESC LIMIT 1; 0|0|TABLE EVENTS USING PRIMARY KEY ORDER BY As soon as I add type criterion to the where clause, though, the performance falls way off (the index is on events(type)): sqlite> SELECT eid,type FROM EVENTS WHERE eid<=3261976 AND type=22 ORDER BY eid DESC LIMIT 1; 3261891|22 CPU Time: user 3.069533 sys 0.485927 sqlite> EXPLAIN QUERY PLAN SELECT eid,type FROM EVENTS WHERE eid<=3261976 AND type=22 ORDER BY eid DESC LIMIT 1; 0|0|TABLE EVENTS WITH INDEX ev4_idx ORDER BY The fact that it seems to be able to do a descending sort very quickly if it's only using the primary key, let me to try the following, which turned out to be more convoluted and was much slower. sqlite> SELECT e.eid,e.type FROM events e, (SELECT eid FROM events WHERE eid<=3261976 ORDER BY eid DESC) l WHERE e.eid=l.eid AND type=22 LIMIT 1; CPU Time: user 29.111574 sys 3.276502 sqlite> EXPLAIN QUERY PLAN SELECT e.eid,e.type FROM events e, (SELECT eid FROM events WHERE eid<=3261976 ORDER BY eid DESC) l WHERE e.eid=l.eid AND type=22 LIMIT 1; 0|0|TABLE events AS e WITH INDEX ev4_idx 1|1|TABLE events USING PRIMARY KEY It seems to me that sqlite can very efficiently do a descending sort on a primary key by itself, but not when it's used with an index. Does that sound correct? Also, I'm using sqlite 3.5.6, not 3.5.9. Does that make a difference? Thanks, Jeff Alexey Pechnikov wrote: > I try with this script on my laptop with 1 Gb RAM > > #!/usr/bin/tclsh > package require sqlite3 > > sqlite3 db index_order.db > db eval {DROP TABLE IF EXISTS events} > db eval {CREATE TABLE events (eid INTEGER PRIMARY KEY)} > db eval {CREATE INDEX ev_desc_idx ON events(eid desc)} > db transaction { > for {set i 0} {$i<1} {incr i} { > db eval {insert into events values ($i)} > } > } > db close > > SQLite version 3.5.9 is used. > > I'm increasing ~ x10 rows count (and search for 32619760 row against 3261976 > in your query) and my database size is similar to your database: > $ ls -lh .|grep db > -rw-r--r-- 1 veter veter 2,4G Июн 29 12:08 index_order.db > > There are my results: > > sqlite> SELECT events.* FROM events WHERE eid<=32619760 ORDER BY eid DESC > LIMIT 1; > 32619760 > CPU Time: user 0.00 sys 0.00 > > sqlite> explain query plan SELECT events.* FROM events WHERE eid<=32619760 > ORDER BY eid DESC LIMIT 1; > 0|0|TABLE events USING PRIMARY KEY ORDER BY > > Index ev_desc_idx is not used. > > Check your SQLite version and try again with my script. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Index and ORDER BY
Thanks for your help. I created the index: CREATE INDEX ev4_idx ON event(type); According to "EXPLAIN QUERY PLAN", it's being used. When I run the query: SELECT events.* FROM events WHERE ( events.type=22) AND (events.eid<=3261976) AND (tid=9) AND (verbose<=1) ORDER BY events.eid DESC LIMIT 1; It takes about 10 seconds. If I remove the tid and verbose clauses conditions, it changes very little (maybe 9 seconds). If I switch the ordering to ascending, though, the query is seemingly instantaneous. Similarly, if I compare an eid for equality, the query is nearly instantaneous. This is part of a GUI application, so a 10-second delay is highly undesirable. Any other suggestions? Thanks, Jeff D. Richard Hipp wrote: > On Jun 27, 2008, at 6:28 PM, Jeff Gibson wrote: > > >> I have a large table and a two column index: >> >> CREATE TABLE events (eid INTEGER PRIMARY KEY, >> time INTEGER, >> aid INTEGER, >> subtype INTEGER, >> type INTEGER, >> tid INTEGER, >> verbose INTEGER); >> >> CREATE INDEX ev4_idx ON events (type,eid) >> >> When I do the following query: >> >> SELECT events.* FROM events WHERE ( events.type=22) AND >> ( events.tid=9) >> AND (events.eid<=3261976) AND (events.verbose<=1) ORDER BY events.eid >> DESC LIMIT 1; >> >> it's very slow. If I switch the ORDER BY to "ASC" instead of "DESC", >> it's very fast. The query plan for both ascending and descending >> sorts >> both say: >> >> 0|0|TABLE events WITH INDEX ev4_idx ORDER BY >> >> For my application, I sometimes need the first and sometimes need the >> last match. I tried selecting MAX(eid) instead of using an ORDER BY, >> but the speed was about the same. Is there any way I can get sqlite >> to >> use the index for the descending order-by? Do I need a different >> index? Or are there any other suggestions? >> > > Every index includes the INTEGER PRIMARY KEY as its last term. So the > second term in your index is redundant. It might be confusing > things. I suggest you set up your index as simply: > > CREATE INDEX ev4_idx ON event(type); > > Or perhaps: > > CREATE INDEX ev4_idx ON event(type, tid); > > Try that and see if it works better for you. > > > D. Richard Hipp > [EMAIL PROTECTED] > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Index and ORDER BY
I have a large table and a two column index: CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, subtype INTEGER, type INTEGER, tid INTEGER, verbose INTEGER); CREATE INDEX ev4_idx ON events (type,eid) When I do the following query: SELECT events.* FROM events WHERE ( events.type=22) AND ( events.tid=9) AND (events.eid<=3261976) AND (events.verbose<=1) ORDER BY events.eid DESC LIMIT 1; it's very slow. If I switch the ORDER BY to "ASC" instead of "DESC", it's very fast. The query plan for both ascending and descending sorts both say: 0|0|TABLE events WITH INDEX ev4_idx ORDER BY For my application, I sometimes need the first and sometimes need the last match. I tried selecting MAX(eid) instead of using an ORDER BY, but the speed was about the same. Is there any way I can get sqlite to use the index for the descending order-by? Do I need a different index? Or are there any other suggestions? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] schema optimization question
I'm sorry if this is an obvious question - I'm new to databases. I have an application where the database is used to log a large number of simulation events. The database is written once and read many times (i.e., there are never any inserts or updates after database creation). The three most interesting tables I have are: CREATE TABLE events (eid INTEGER PRIMARY KEY, time INTEGER, aid INTEGER, subtype INTEGER); CREATE TABLE actions (aid INTEGER PRIMARY KEY, type INTEGER, seqnum INTEGER, tid INTEGER, instid INTEGER); CREATE TABLE subtypes (type INTEGER, subtype INTEGER, name TEXT, verbose INTEGER, PRIMARY KEY(type,subtype) ); The column names are such that columns in different tables with the same name act as foreign keys. The largest (and most often queried) table is events, and it can have many millions of entries. The actions table is also large (about a fifth as big as events) and subtypes is very small (dozens of entries). My application involves querying events many times, but very common queries include events that match a particular verbose value and/or a particular type value. This leads to queries that have one or two joins, and such queries are substantially slower than just a query on just the events table. The question is, what can I do to speed up those queries? The obvious answer would be to put type and verbose as columns in the events table, but they would be redundant. Is that par for the course, or is there some best practice I'm overlooking? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Thanks for all the suggestions. My schema is now a lot cleaner, and my application runs 30% faster! Jeff Richard Klein wrote: >> Jeff Gibson wrote: >> >>> One thing your earlier suggestion brought up. The way I was hooking up >>> tables before was something along the lines of: >>> >>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); >>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >>> CREATE TABLE link(id1 INTEGER, id2 INTEGER); >>> >>> My understanding of your suggestion is: >>> >>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, >>> ); >>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); >>> >>> with the understanding that id2 in primary will often be NULL. Are >>> there any circumstances where the first alternative is more >>> appropriate? I'm pretty new to databases, but I got my original idea >>> from a few schemas that I've seen. I'm just trying to understand the >>> trade-offs. >>> Thanks a lot for your help, >>> Jeff >>> >>> >> These different forms of linking the records are used for different >> types of relations. The two tables can have records that are related in >> a various combinations of one or many to one or many. >> >> one to one >> many to one >> one to many >> many to many >> >> Using a third table is required to implement a many to many relation. >> Each record in the third table stores one item of the relation (i.e >> which record in the first table is related to which record in the second >> table). >> >> A one to many relation is created by assigning an id to the record in >> the one side of the relation and referencing that id in a column on the >> many side of the relation. A many to one relation is the same a one to >> many relation, with the order of the tables reversed. This is what you >> have shown as Richard's suggestion. >> >> A one to one relation can be created by assigning an id to one record >> and using that same id as the primary key on the related record. >> >> For your case, you need a one to one relation between the primary and >> secondary tables. This can be done by using the same id for the related >> record in the secondary table as was assigned to the record in the >> primary table. >> >> CREATE TABLE primary(id INTEGER PRIMARY KEY, ); >> CREATE TABLE secondary(id INTEGER PRIMARY KEY, ); >> >> insert into primary values(null, ); >> insert into secondary values(last_insert_rowid(), ); >> >> When you want to retrieve the records you can use a join >> >> select * from primary join secondary using(id); >> >> or you can use a second select to retrieve the secondary fields using >> the id obtained from the primary field. >> >> select * from primary; >> if (has_secondary()) >> select * from secondary where id = primary.id; >> >> This does not waste any space storing unnecessary null fields. You >> should only resort the more complex foreign keys when you need to >> represent a more complex relation. >> >> HTH >> Dennis Cote >> > > As Dennis points out, I had assumed that the relationship between the > primary and secondary tables was many-to-one, i.e. that several entries > in the primary table could refer to the same entry in the secondary > table. > > If that is not the case -- if the relationship is in fact one-to-one -- > then Dennis's solution is the best one. > > I would use Dennis's two-SELECT approach rather than the join if speed > is an issue. > > - Richard > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Right. Hence my hesitation. :-) I suppose it's possible to check uniqueness once at the end in O(N), but it would also take O(N) extra storage, and I doubt sqlite is doing that... One thing your earlier suggestion brought up. The way I was hooking up tables before was something along the lines of: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, ); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); CREATE TABLE link(id1 INTEGER, id2 INTEGER); My understanding of your suggestion is: CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, ); CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, ); with the understanding that id2 in primary will often be NULL. Are there any circumstances where the first alternative is more appropriate? I'm pretty new to databases, but I got my original idea from a few schemas that I've seen. I'm just trying to understand the trade-offs. Thanks a lot for your help, Jeff Richard Klein wrote: > On second thought, checking the entire table for uniqueness would seem > to require O(N log N), regardless of whether it is done one INSERT at > a time, or all at once after the table is created! > > - Richard > > Richard Klein wrote: > >> Quoting from the description of CREATE TABLE in the SQL Syntax section of >> the SQLite documentation: >> >> "The UNIQUE constraint causes an index to be created on the specified >> columns. >> This index must contain unique keys." >> >> The creation of an index would seem to imply an O(log N) search on each >> insertion, so you should be okay. >> >> My advice would be to try it and see. If table creation takes too long, >> you can always remove the UNIQUE constraint, and then write a routine to >> check the table for uniqueness after it's created. >> >> - Richard >> >> >> Jeff Gibson wrote: >> >>> I see. Fortunately my application simplifies this since the database is >>> created once and read many times, but is never modified after creation >>> time. Regarding constraints, I was thinking it might be helpful to add >>> a few where applicable (whether foreign key constraints or even simple >>> uniqueness constraints) basically as assertions, but I was worried about >>> the overhead it would take to enforce them when I'm creating the >>> database. Do you know if a uniqueness constraint, for instance, does an >>> O(N) search on each insertion? If so, it sounds prohibitive. >>> Thanks, >>> Jeff >>> >>> Richard Klein wrote: >>> >>>> Jeff, >>>> >>>> I think that's the right way to go for your application. There are a few >>>> things you should be aware of regarding this approach. >>>> >>>> A column in one table that references a column in another table is called >>>> a "foreign key" in database lingo. >>>> >>>> An issue with foreign keys is that it is important to keep the referencing >>>> table (big_table) in sync with the referenced table (secondary_table). >>>> >>>> For example, if you delete an entry from secondary_table, you want to >>>> update >>>> the foreign key column in all entries in big_table that reference that >>>> entry. >>>> >>>> What's the proper way to update the foreign key? It depends on your appli- >>>> cation. You might want to set the foreign key in the referencing entries >>>> to >>>> NULL, or you might want to delete the referencing entries, or you might >>>> want >>>> to do something else. >>>> >>>> In standard, full-blown SQL, you can define the synchronization behavior >>>> you >>>> want with a "foreign key constraint". That is, you might create big_table >>>> as follows: >>>> >>>> CREATE TABLE big_table ( >>>> idINTEGER PRIMARY KEY, >>>> col1 INTEGER, >>>> col2 REAL, >>>> col3 TEXT, >>>> col4 BLOB, >>>> col5 INTEGER, >>>> CONSTRAINT col5_fk FOREIGN KEY(col5) >>>>REFERENCES secondary_table(id) ON DELETE SET NULL, >>>> ); >>>> >>>> This would define col5 as a foreign key referencing the id column of >>>> secondary_ >>>> table, and would specify that col5 should be set to NULL in all referencing >>>> entries in big_table when an entry in secondary_table is deleted. >>>> >>>
Re: [sqlite] schema design question
I see. Fortunately my application simplifies this since the database is created once and read many times, but is never modified after creation time. Regarding constraints, I was thinking it might be helpful to add a few where applicable (whether foreign key constraints or even simple uniqueness constraints) basically as assertions, but I was worried about the overhead it would take to enforce them when I'm creating the database. Do you know if a uniqueness constraint, for instance, does an O(N) search on each insertion? If so, it sounds prohibitive. Thanks, Jeff Richard Klein wrote: > Jeff, > > I think that's the right way to go for your application. There are a few > things you should be aware of regarding this approach. > > A column in one table that references a column in another table is called > a "foreign key" in database lingo. > > An issue with foreign keys is that it is important to keep the referencing > table (big_table) in sync with the referenced table (secondary_table). > > For example, if you delete an entry from secondary_table, you want to update > the foreign key column in all entries in big_table that reference that entry. > > What's the proper way to update the foreign key? It depends on your appli- > cation. You might want to set the foreign key in the referencing entries to > NULL, or you might want to delete the referencing entries, or you might want > to do something else. > > In standard, full-blown SQL, you can define the synchronization behavior you > want with a "foreign key constraint". That is, you might create big_table > as follows: > > CREATE TABLE big_table ( > idINTEGER PRIMARY KEY, > col1 INTEGER, > col2 REAL, > col3 TEXT, > col4 BLOB, > col5 INTEGER, > CONSTRAINT col5_fk FOREIGN KEY(col5) >REFERENCES secondary_table(id) ON DELETE SET NULL, > ); > > This would define col5 as a foreign key referencing the id column of > secondary_ > table, and would specify that col5 should be set to NULL in all referencing > entries in big_table when an entry in secondary_table is deleted. > > Unfortunately, SQLite does not implement foreign key constraints. More > precisely, > they don't cause syntax errors, but they aren't enforced. Therefore, you will > have to implement the desired synchronization behavior yourself. Fortunately, > this is easy to do with the use of TRIGGERs, which *are* implemented in > SQLite. > > Here are some links that might be useful: > > Foreign keys: http://en.wikipedia.org/wiki/Foreign_key > SQLite triggers: http://www.sqlite.org/lang_createtrigger.html > > Hope this helps, > - Richard > > >> Thanks! I'll give that a try. >>Jeff >> >> Richard Klein wrote: >> Whether or not the the secondary columns are needed is a function of one of the primary columns. That function involves values from another table, though, so the general case would require a join. That other table is small, however, so I generally cache it outside the database. Some pseudocode for my expected use would be something like: prepare("SELECT primary_columns FROM big_table WHERE some_criterion") while(step()) { if( F(primary_column_values) ) { Fetch secondary values } do something with primary and maybe secondary values; } Where F would be implemented outside the database. Thanks, Jeff >>> I assume that the primary SELECT shown above can be made suitably fast >>> by creating the appropriate indices on big_table. >>> >>> If the secondary columns are kept in a separate, secondary_table, and >>> a fifth primary column is added that contains the ROWID of the approp- >>> riate entry in the secondary_table (or NULL if the secondary_table is >>> not needed), then the "Fetch secondary values" operation should be very >>> fast as well. >>> >>> It seems to me that this approach would be faster than a join, and >>> would consume less space than an 8-column table containing mostly >>> NULLs in the secondary columns. >>> >>> Of course, this approach would cost you some extra space, in the form >>> of the 5th primary column containing the secondary ROWID. >>> >>> - Richard Klein >>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >>> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlit
Re: [sqlite] schema design question
Thanks! I'll give that a try. Jeff Richard Klein wrote: >> Whether or not the the secondary columns are needed is a function of one >> of the primary columns. That function involves values from another >> table, though, so the general case would require a join. That other >> table is small, however, so I generally cache it outside the database. >> Some pseudocode for my expected use would be something like: >> >> prepare("SELECT primary_columns FROM big_table WHERE some_criterion") >> while(step()) { >> >>if( F(primary_column_values) ) { >> Fetch secondary values >> } >> >> do something with primary and maybe secondary values; >> >> } >> >> Where F would be implemented outside the database. >> Thanks, >> Jeff >> > > I assume that the primary SELECT shown above can be made suitably fast > by creating the appropriate indices on big_table. > > If the secondary columns are kept in a separate, secondary_table, and > a fifth primary column is added that contains the ROWID of the approp- > riate entry in the secondary_table (or NULL if the secondary_table is > not needed), then the "Fetch secondary values" operation should be very > fast as well. > > It seems to me that this approach would be faster than a join, and > would consume less space than an 8-column table containing mostly > NULLs in the secondary columns. > > Of course, this approach would cost you some extra space, in the form > of the 5th primary column containing the secondary ROWID. > > - Richard Klein > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] schema design question
Whether or not the the secondary columns are needed is a function of one of the primary columns. That function involves values from another table, though, so the general case would require a join. That other table is small, however, so I generally cache it outside the database. Some pseudocode for my expected use would be something like: prepare("SELECT primary_columns FROM big_table WHERE some_criterion") while(step()) { if( F(primary_column_values) ) { Fetch secondary values } do something with primary and maybe secondary values; } Where F would be implemented outside the database. Thanks, Jeff Richard Klein wrote: >> I'm pretty new to databases, and I have a schema design question. I >> don't know enough about the guts of how sqlite works to know how to make >> some tradeoffs. I have a large (potentially millions of entries) table >> and it has 4 columns which are needed for every entry, and 4 more that >> are needed for about 10% of the entries. I'm trying to decide whether I >> want one table with 8 columns with a bunch of NULLs or two tables with >> no NULLs that will require a join to get all of the 8 column values. I >> assume this is a space/performance tradeoff, since I would think >> searching one table would be a lot faster than doing a join, but I'm not >> sure what the impact would be in terms of disk/memory/performance of all >> those NULLs. >>Does anybody have any suggestions? >>Thanks, >>Jeff >> > > Can you give us a little more information? Specifically, is there any > way to tell, by looking at the 4 primary columns, that you are dealing > with one of the 10% entries that requires looking at the 4 secondary > columns? > > - RichardKlein > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] schema design question
I'm pretty new to databases, and I have a schema design question. I don't know enough about the guts of how sqlite works to know how to make some tradeoffs. I have a large (potentially millions of entries) table and it has 4 columns which are needed for every entry, and 4 more that are needed for about 10% of the entries. I'm trying to decide whether I want one table with 8 columns with a bunch of NULLs or two tables with no NULLs that will require a join to get all of the 8 column values. I assume this is a space/performance tradeoff, since I would think searching one table would be a lot faster than doing a join, but I'm not sure what the impact would be in terms of disk/memory/performance of all those NULLs. Does anybody have any suggestions? Thanks, Jeff ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users