Re: [sqlite] schema design question

Jeff Gibson Thu, 10 Apr 2008 18:26:59 -0700

Right.  Hence my hesitation.  :-)  I suppose it's possible to check 
uniqueness once at the end in O(N), but it would also take O(N) extra 
storage, and I doubt sqlite is doing that...


One thing your earlier suggestion brought up.  The way I was hooking up 
tables before was something along the lines of:

CREATE TABLE primary(id1 INTEGER PRIMARY KEY, <primary_fields>);
CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>);
CREATE TABLE link(id1 INTEGER, id2 INTEGER);

My understanding of your suggestion is:

CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
<primary_fields>);
CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, <secondary_fields>);

with the understanding that id2 in primary will often be NULL.  Are 
there any circumstances where the first alternative is more 
appropriate?  I'm pretty new to databases, but I got my original idea 
from a few schemas that I've seen.  I'm just trying to understand the 
trade-offs.
    Thanks a lot for your help,
    Jeff

Richard Klein wrote:
> On second thought, checking the entire table for uniqueness would seem
> to require O(N log N), regardless of whether it is done one INSERT at
> a time, or all at once after the table is created!
>
> - Richard
>
> Richard Klein wrote:
>   
>> Quoting from the description of CREATE TABLE in the SQL Syntax section of
>> the SQLite documentation:
>>
>> "The UNIQUE constraint causes an index to be created on the specified 
>> columns.
>> This index must contain unique keys."
>>
>> The creation of an index would seem to imply an O(log N) search on each
>> insertion, so you should be okay.
>>
>> My advice would be to try it and see.  If table creation takes too long,
>> you can always remove the UNIQUE constraint, and then write a routine to
>> check the table for uniqueness after it's created.
>>
>> - Richard
>>
>>
>> Jeff Gibson wrote:
>>     
>>> I see.  Fortunately my application simplifies this since the database is 
>>> created once and read many times, but is never modified after creation 
>>> time.  Regarding constraints, I was thinking it might be helpful to add 
>>> a few where applicable (whether foreign key constraints or even simple 
>>> uniqueness constraints) basically as assertions, but I was worried about 
>>> the overhead it would take to enforce them when I'm creating the 
>>> database.  Do you know if a uniqueness constraint, for instance, does an 
>>> O(N) search on each insertion?  If so, it sounds prohibitive.
>>>     Thanks,
>>>     Jeff
>>>
>>> Richard Klein wrote:
>>>       
>>>> Jeff,
>>>>
>>>> I think that's the right way to go for your application.  There are a few
>>>> things you should be aware of regarding this approach.
>>>>
>>>> A column in one table that references a column in another table is called
>>>> a "foreign key" in database lingo.
>>>>
>>>> An issue with foreign keys is that it is important to keep the referencing
>>>> table (big_table) in sync with the referenced table (secondary_table).
>>>>
>>>> For example, if you delete an entry from secondary_table, you want to 
>>>> update
>>>> the foreign key column in all entries in big_table that reference that 
>>>> entry.
>>>>
>>>> What's the proper way to update the foreign key?  It depends on your appli-
>>>> cation.  You might want to set the foreign key in the referencing entries 
>>>> to
>>>> NULL, or you might want to delete the referencing entries, or you might 
>>>> want
>>>> to do something else.
>>>>
>>>> In standard, full-blown SQL, you can define the synchronization behavior 
>>>> you
>>>> want with a "foreign key constraint".  That is, you might create big_table
>>>> as follows:
>>>>
>>>> CREATE TABLE big_table (
>>>>     id    INTEGER PRIMARY KEY,
>>>>     col1  INTEGER,
>>>>     col2  REAL,
>>>>     col3  TEXT,
>>>>     col4  BLOB,
>>>>     col5  INTEGER,
>>>>     CONSTRAINT col5_fk FOREIGN KEY(col5)
>>>>        REFERENCES secondary_table(id) ON DELETE SET NULL,
>>>> );
>>>>
>>>> This would define col5 as a foreign key referencing the id column of 
>>>> secondary_
>>>> table, and would specify that col5 should be set to NULL in all referencing
>>>> entries in big_table when an entry in secondary_table is deleted.
>>>>
>>>> Unfortunately, SQLite does not implement foreign key constraints.  More 
>>>> precisely,
>>>> they don't cause syntax errors, but they aren't enforced.  Therefore, you 
>>>> will
>>>> have to implement the desired synchronization behavior yourself.  
>>>> Fortunately,
>>>> this is easy to do with the use of TRIGGERs, which *are* implemented in 
>>>> SQLite.
>>>>
>>>> Here are some links that might be useful:
>>>>
>>>> Foreign keys:     http://en.wikipedia.org/wiki/Foreign_key
>>>> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>>>>
>>>> Hope this helps,
>>>> - Richard
>>>>
>>>>   
>>>>         
>>>>> Thanks!  I'll give that a try.
>>>>>    Jeff
>>>>>
>>>>> Richard Klein wrote:
>>>>>     
>>>>>           
>>>>>>> Whether or not the the secondary columns are needed is a function of 
>>>>>>> one 
>>>>>>> of the primary columns.  That function involves values from another 
>>>>>>> table, though, so the general case would require a join.  That other 
>>>>>>> table is small, however, so I generally cache it outside the database.  
>>>>>>> Some pseudocode for my expected use would be something like:
>>>>>>>
>>>>>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>>>>>>> while(step()) {
>>>>>>>
>>>>>>>    if( F(primary_column_values) ) {
>>>>>>>       Fetch secondary values
>>>>>>>   }
>>>>>>>
>>>>>>>     do something with primary and maybe secondary values;
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> Where F would be implemented outside the database.
>>>>>>>     Thanks,
>>>>>>>     Jeff
>>>>>>>     
>>>>>>>         
>>>>>>>               
>>>>>> I assume that the primary SELECT shown above can be made suitably fast
>>>>>> by creating the appropriate indices on big_table.
>>>>>>
>>>>>> If the secondary columns are kept in a separate, secondary_table, and
>>>>>> a fifth primary column is added that contains the ROWID of the approp-
>>>>>> riate entry in the secondary_table (or NULL if the secondary_table is
>>>>>> not needed), then the "Fetch secondary values" operation should be very
>>>>>> fast as well.
>>>>>>
>>>>>> It seems to me that this approach would be faster than a join, and
>>>>>> would consume less space than an 8-column table containing mostly
>>>>>> NULLs in the secondary columns.
>>>>>>
>>>>>> Of course, this approach would cost you some extra space, in the form
>>>>>> of the 5th primary column containing the secondary ROWID.
>>>>>>
>>>>>> - Richard Klein
>>>>>>
>>>>>> _______________________________________________
>>>>>> sqlite-users mailing list
>>>>>> [email protected]
>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>>   
>>>>>>       
>>>>>>             
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> [email protected]
>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>     
>>>>>           
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> [email protected]
>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>   
>>>>         
>>> _______________________________________________
>>> sqlite-users mailing list
>>> [email protected]
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>       
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>     
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] schema design question

Reply via email to