Re: [sqlite] schema design question

Richard Klein Thu, 10 Apr 2008 18:06:43 -0700

On second thought, checking the entire table for uniqueness would seem
to require O(N log N), regardless of whether it is done one INSERT at
a time, or all at once after the table is created!


- Richard

Richard Klein wrote:
> Quoting from the description of CREATE TABLE in the SQL Syntax section of
> the SQLite documentation:
> 
> "The UNIQUE constraint causes an index to be created on the specified columns.
> This index must contain unique keys."
> 
> The creation of an index would seem to imply an O(log N) search on each
> insertion, so you should be okay.
> 
> My advice would be to try it and see.  If table creation takes too long,
> you can always remove the UNIQUE constraint, and then write a routine to
> check the table for uniqueness after it's created.
> 
> - Richard
> 
> 
> Jeff Gibson wrote:
>> I see.  Fortunately my application simplifies this since the database is 
>> created once and read many times, but is never modified after creation 
>> time.  Regarding constraints, I was thinking it might be helpful to add 
>> a few where applicable (whether foreign key constraints or even simple 
>> uniqueness constraints) basically as assertions, but I was worried about 
>> the overhead it would take to enforce them when I'm creating the 
>> database.  Do you know if a uniqueness constraint, for instance, does an 
>> O(N) search on each insertion?  If so, it sounds prohibitive.
>>     Thanks,
>>     Jeff
>>
>> Richard Klein wrote:
>>> Jeff,
>>>
>>> I think that's the right way to go for your application.  There are a few
>>> things you should be aware of regarding this approach.
>>>
>>> A column in one table that references a column in another table is called
>>> a "foreign key" in database lingo.
>>>
>>> An issue with foreign keys is that it is important to keep the referencing
>>> table (big_table) in sync with the referenced table (secondary_table).
>>>
>>> For example, if you delete an entry from secondary_table, you want to update
>>> the foreign key column in all entries in big_table that reference that 
>>> entry.
>>>
>>> What's the proper way to update the foreign key?  It depends on your appli-
>>> cation.  You might want to set the foreign key in the referencing entries to
>>> NULL, or you might want to delete the referencing entries, or you might want
>>> to do something else.
>>>
>>> In standard, full-blown SQL, you can define the synchronization behavior you
>>> want with a "foreign key constraint".  That is, you might create big_table
>>> as follows:
>>>
>>> CREATE TABLE big_table (
>>>     id    INTEGER PRIMARY KEY,
>>>     col1  INTEGER,
>>>     col2  REAL,
>>>     col3  TEXT,
>>>     col4  BLOB,
>>>     col5  INTEGER,
>>>     CONSTRAINT col5_fk FOREIGN KEY(col5)
>>>        REFERENCES secondary_table(id) ON DELETE SET NULL,
>>> );
>>>
>>> This would define col5 as a foreign key referencing the id column of 
>>> secondary_
>>> table, and would specify that col5 should be set to NULL in all referencing
>>> entries in big_table when an entry in secondary_table is deleted.
>>>
>>> Unfortunately, SQLite does not implement foreign key constraints.  More 
>>> precisely,
>>> they don't cause syntax errors, but they aren't enforced.  Therefore, you 
>>> will
>>> have to implement the desired synchronization behavior yourself.  
>>> Fortunately,
>>> this is easy to do with the use of TRIGGERs, which *are* implemented in 
>>> SQLite.
>>>
>>> Here are some links that might be useful:
>>>
>>> Foreign keys:     http://en.wikipedia.org/wiki/Foreign_key
>>> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>>>
>>> Hope this helps,
>>> - Richard
>>>
>>>   
>>>> Thanks!  I'll give that a try.
>>>>    Jeff
>>>>
>>>> Richard Klein wrote:
>>>>     
>>>>>> Whether or not the the secondary columns are needed is a function of one 
>>>>>> of the primary columns.  That function involves values from another 
>>>>>> table, though, so the general case would require a join.  That other 
>>>>>> table is small, however, so I generally cache it outside the database.  
>>>>>> Some pseudocode for my expected use would be something like:
>>>>>>
>>>>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>>>>>> while(step()) {
>>>>>>
>>>>>>    if( F(primary_column_values) ) {
>>>>>>       Fetch secondary values
>>>>>>   }
>>>>>>
>>>>>>     do something with primary and maybe secondary values;
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Where F would be implemented outside the database.
>>>>>>     Thanks,
>>>>>>     Jeff
>>>>>>     
>>>>>>         
>>>>> I assume that the primary SELECT shown above can be made suitably fast
>>>>> by creating the appropriate indices on big_table.
>>>>>
>>>>> If the secondary columns are kept in a separate, secondary_table, and
>>>>> a fifth primary column is added that contains the ROWID of the approp-
>>>>> riate entry in the secondary_table (or NULL if the secondary_table is
>>>>> not needed), then the "Fetch secondary values" operation should be very
>>>>> fast as well.
>>>>>
>>>>> It seems to me that this approach would be faster than a join, and
>>>>> would consume less space than an 8-column table containing mostly
>>>>> NULLs in the secondary columns.
>>>>>
>>>>> Of course, this approach would cost you some extra space, in the form
>>>>> of the 5th primary column containing the secondary ROWID.
>>>>>
>>>>> - Richard Klein
>>>>>
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> sqlite-users@sqlite.org
>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>   
>>>>>       
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> sqlite-users@sqlite.org
>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>     
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>   
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] schema design question

Reply via email to