Re: [sqlite] schema design question

Richard Klein Thu, 10 Apr 2008 19:22:02 -0700

Jeff -

I'm pretty new to databases myself, but I have seen examples of
schemas that resemble your first alternative, i.e. that involve
the creation of a third table containing just the linkages
between the primary and secondary tables.


And in fact, I think that is the right solution for your appli-
cation.  Although the link table duplicates the id1 column, it
contains no wasted (NULL) entries.

The second alternative (my proposal) doesn't duplicate id1, but
90% of the id2's are NULL.  Since the primary table is very large,
this represents an excessive waste of space.

- Richard


Jeff Gibson wrote:
> Right.  Hence my hesitation.  :-)  I suppose it's possible to check 
> uniqueness once at the end in O(N), but it would also take O(N) extra 
> storage, and I doubt sqlite is doing that...
> 
> One thing your earlier suggestion brought up.  The way I was hooking up 
> tables before was something along the lines of:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, <primary_fields>);
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>);
> CREATE TABLE link(id1 INTEGER, id2 INTEGER);
> 
> My understanding of your suggestion is:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
> <primary_fields>);
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>);
> 
> with the understanding that id2 in primary will often be NULL.  Are 
> there any circumstances where the first alternative is more 
> appropriate?  I'm pretty new to databases, but I got my original idea 
> from a few schemas that I've seen.  I'm just trying to understand the 
> trade-offs.
>     Thanks a lot for your help,
>     Jeff
> 
> Richard Klein wrote:
>> On second thought, checking the entire table for uniqueness would seem
>> to require O(N log N), regardless of whether it is done one INSERT at
>> a time, or all at once after the table is created!
>>
>> - Richard
>>
>> Richard Klein wrote:
>>   
>>> Quoting from the description of CREATE TABLE in the SQL Syntax section of
>>> the SQLite documentation:
>>>
>>> "The UNIQUE constraint causes an index to be created on the specified 
>>> columns.
>>> This index must contain unique keys."
>>>
>>> The creation of an index would seem to imply an O(log N) search on each
>>> insertion, so you should be okay.
>>>
>>> My advice would be to try it and see.  If table creation takes too long,
>>> you can always remove the UNIQUE constraint, and then write a routine to
>>> check the table for uniqueness after it's created.
>>>
>>> - Richard
>>>
>>>
>>> Jeff Gibson wrote:
>>>     
>>>> I see.  Fortunately my application simplifies this since the database is 
>>>> created once and read many times, but is never modified after creation 
>>>> time.  Regarding constraints, I was thinking it might be helpful to add 
>>>> a few where applicable (whether foreign key constraints or even simple 
>>>> uniqueness constraints) basically as assertions, but I was worried about 
>>>> the overhead it would take to enforce them when I'm creating the 
>>>> database.  Do you know if a uniqueness constraint, for instance, does an 
>>>> O(N) search on each insertion?  If so, it sounds prohibitive.
>>>>     Thanks,
>>>>     Jeff
>>>>
>>>> Richard Klein wrote:
>>>>       
>>>>> Jeff,
>>>>>
>>>>> I think that's the right way to go for your application.  There are a few
>>>>> things you should be aware of regarding this approach.
>>>>>
>>>>> A column in one table that references a column in another table is called
>>>>> a "foreign key" in database lingo.
>>>>>
>>>>> An issue with foreign keys is that it is important to keep the referencing
>>>>> table (big_table) in sync with the referenced table (secondary_table).
>>>>>
>>>>> For example, if you delete an entry from secondary_table, you want to 
>>>>> update
>>>>> the foreign key column in all entries in big_table that reference that 
>>>>> entry.
>>>>>
>>>>> What's the proper way to update the foreign key?  It depends on your 
>>>>> appli-
>>>>> cation.  You might want to set the foreign key in the referencing entries 
>>>>> to
>>>>> NULL, or you might want to delete the referencing entries, or you might 
>>>>> want
>>>>> to do something else.
>>>>>
>>>>> In standard, full-blown SQL, you can define the synchronization behavior 
>>>>> you
>>>>> want with a "foreign key constraint".  That is, you might create big_table
>>>>> as follows:
>>>>>
>>>>> CREATE TABLE big_table (
>>>>>     id    INTEGER PRIMARY KEY,
>>>>>     col1  INTEGER,
>>>>>     col2  REAL,
>>>>>     col3  TEXT,
>>>>>     col4  BLOB,
>>>>>     col5  INTEGER,
>>>>>     CONSTRAINT col5_fk FOREIGN KEY(col5)
>>>>>        REFERENCES secondary_table(id) ON DELETE SET NULL,
>>>>> );
>>>>>
>>>>> This would define col5 as a foreign key referencing the id column of 
>>>>> secondary_
>>>>> table, and would specify that col5 should be set to NULL in all 
>>>>> referencing
>>>>> entries in big_table when an entry in secondary_table is deleted.
>>>>>
>>>>> Unfortunately, SQLite does not implement foreign key constraints.  More 
>>>>> precisely,
>>>>> they don't cause syntax errors, but they aren't enforced.  Therefore, you 
>>>>> will
>>>>> have to implement the desired synchronization behavior yourself.  
>>>>> Fortunately,
>>>>> this is easy to do with the use of TRIGGERs, which *are* implemented in 
>>>>> SQLite.
>>>>>
>>>>> Here are some links that might be useful:
>>>>>
>>>>> Foreign keys:     http://en.wikipedia.org/wiki/Foreign_key
>>>>> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>>>>>
>>>>> Hope this helps,
>>>>> - Richard
>>>>>
>>>>>   
>>>>>         
>>>>>> Thanks!  I'll give that a try.
>>>>>>    Jeff
>>>>>>
>>>>>> Richard Klein wrote:
>>>>>>     
>>>>>>           
>>>>>>>> Whether or not the the secondary columns are needed is a function of 
>>>>>>>> one 
>>>>>>>> of the primary columns.  That function involves values from another 
>>>>>>>> table, though, so the general case would require a join.  That other 
>>>>>>>> table is small, however, so I generally cache it outside the database. 
>>>>>>>>  
>>>>>>>> Some pseudocode for my expected use would be something like:
>>>>>>>>
>>>>>>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>>>>>>>> while(step()) {
>>>>>>>>
>>>>>>>>    if( F(primary_column_values) ) {
>>>>>>>>       Fetch secondary values
>>>>>>>>   }
>>>>>>>>
>>>>>>>>     do something with primary and maybe secondary values;
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> Where F would be implemented outside the database.
>>>>>>>>     Thanks,
>>>>>>>>     Jeff
>>>>>>>>     
>>>>>>>>         
>>>>>>>>               
>>>>>>> I assume that the primary SELECT shown above can be made suitably fast
>>>>>>> by creating the appropriate indices on big_table.
>>>>>>>
>>>>>>> If the secondary columns are kept in a separate, secondary_table, and
>>>>>>> a fifth primary column is added that contains the ROWID of the approp-
>>>>>>> riate entry in the secondary_table (or NULL if the secondary_table is
>>>>>>> not needed), then the "Fetch secondary values" operation should be very
>>>>>>> fast as well.
>>>>>>>
>>>>>>> It seems to me that this approach would be faster than a join, and
>>>>>>> would consume less space than an 8-column table containing mostly
>>>>>>> NULLs in the secondary columns.
>>>>>>>
>>>>>>> Of course, this approach would cost you some extra space, in the form
>>>>>>> of the 5th primary column containing the secondary ROWID.
>>>>>>>
>>>>>>> - Richard Klein
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sqlite-users mailing list
>>>>>>> [email protected]
>>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>>>   
>>>>>>>       
>>>>>>>             
>>>>>> _______________________________________________
>>>>>> sqlite-users mailing list
>>>>>> [email protected]
>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>>     
>>>>>>           
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [email protected]
>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>   
>>>>>         
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [email protected]
>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>       
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [email protected]
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>     
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>   
> 
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] schema design question

Reply via email to