Re: [sqlite] schema design question

2008-04-12 Thread Jay A. Kreibich
On Fri, Apr 11, 2008 at 01:54:43PM -0700, Richard Klein scratched on the wall:
> Jay A. Kreibich wrote:
> >On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the 
> >wall:
> >
> >>My advice would be to try it and see.  If table creation takes too long,
> >>you can always remove the UNIQUE constraint, and then write a routine to
> >>check the table for uniqueness after it's created.
> >
> >  That "routine" could simply be creating an explicit unique index on
> >  the column after all the data is loaded.  If the index can be
> >  created, you're good to go.
> >
> >  This is still a slow process, but I know the creation of indexes on
> >  existing tables is one area the development team hopes to speed up.
> >  I'm not sure what the priority of that is, however.
> 
> Wouldn't this take just as long as creating the index immediately after
> creating the table, and then letting each INSERT update the index?

  There are savings to be found when you have the whole set of indexes
  available to you, just as you get some savings from batching up
  INSERTs.  If you have a large enough page cache, you might be able to
  do a substantial part of the sort in one disk pass.  If the sorter has
  the whole set avaliable, there's also the issue of how you deal
  with disk management and what sorting algorithm is used.  Current
  thought is that there is room for improvement:

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg31481.html

  Although that obviously doesn't help in the here and now.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-11 Thread Jeff Gibson
Thanks for all the suggestions.  My schema is now a lot cleaner, and my 
application runs 30% faster!
Jeff

Richard Klein wrote:
>> Jeff Gibson wrote:
>> 
>>> One thing your earlier suggestion brought up.  The way I was hooking up 
>>> tables before was something along the lines of:
>>>
>>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, );
>>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
>>> CREATE TABLE link(id1 INTEGER, id2 INTEGER);
>>>
>>> My understanding of your suggestion is:
>>>
>>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
>>> );
>>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
>>>
>>> with the understanding that id2 in primary will often be NULL.  Are 
>>> there any circumstances where the first alternative is more 
>>> appropriate?  I'm pretty new to databases, but I got my original idea 
>>> from a few schemas that I've seen.  I'm just trying to understand the 
>>> trade-offs.
>>> Thanks a lot for your help,
>>> Jeff
>>>
>>>   
>> These different forms of linking the records are used for different 
>> types of relations. The two tables can have records that are related in 
>> a various combinations of one or many to one or many.
>>
>>  one to one
>>  many to one
>>  one to many
>>  many to many
>>
>> Using a third table is required to implement a many to many relation. 
>> Each record in the third table stores one item of the relation (i.e 
>> which record in the first table is related to which record in the second 
>> table).
>>
>> A one to many relation is created by assigning an id to the record in 
>> the one side of the relation and referencing that id in a column on the 
>> many side of the relation. A many to one relation is the same a one to 
>> many relation, with the order of the tables reversed. This is what you 
>> have shown as Richard's suggestion.
>>
>> A one to one relation can be created by assigning an id to one record 
>> and using that same id as the primary key on the related record.
>>
>> For your case, you need a one to one relation between the primary and 
>> secondary tables. This can be done by using the same id for the related 
>> record in the secondary table as was assigned to the record in the 
>> primary table.
>>
>> CREATE TABLE primary(id INTEGER PRIMARY KEY, );
>> CREATE TABLE secondary(id INTEGER PRIMARY KEY, );
>>
>> insert into primary values(null, );
>> insert into secondary values(last_insert_rowid(), );
>>
>> When you want to retrieve the records you can use a join
>>
>> select * from primary join secondary using(id);
>>
>> or you can use a second select to retrieve the secondary fields using 
>> the id obtained from the primary field.
>>
>> select * from primary;
>> if (has_secondary())
>>  select * from secondary where id = primary.id;
>>
>> This does not waste any space storing unnecessary null fields. You 
>> should only resort the more complex foreign keys when you need to 
>> represent a more complex relation.
>>
>> HTH
>> Dennis Cote
>> 
>
> As Dennis points out, I had assumed that the relationship between the
> primary and secondary tables was many-to-one, i.e. that several entries
> in the primary table could refer to the same entry in the secondary
> table.
>
> If that is not the case -- if the relationship is in fact one-to-one --
> then Dennis's solution is the best one.
>
> I would use Dennis's two-SELECT approach rather than the join if speed
> is an issue.
>
> - Richard
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-11 Thread Richard Klein
Jay A. Kreibich wrote:
> On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the wall:
> 
>> My advice would be to try it and see.  If table creation takes too long,
>> you can always remove the UNIQUE constraint, and then write a routine to
>> check the table for uniqueness after it's created.
> 
>   That "routine" could simply be creating an explicit unique index on
>   the column after all the data is loaded.  If the index can be
>   created, you're good to go.
> 
>   This is still a slow process, but I know the creation of indexes on
>   existing tables is one area the development team hopes to speed up.
>   I'm not sure what the priority of that is, however.
> 
>-j
> 

Wouldn't this take just as long as creating the index immediately after
creating the table, and then letting each INSERT update the index?

Creating the index after all the data is loaded requires that the first
row be inserted into the index Btree, then the second row be inserted
into the Btree at the proper place, then the third row, and so on,
re-balancing the Btree as necessary.

Isn't this exactly the same process that takes place when you create
the index first and then insert each row?

- Richard
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-11 Thread Richard Klein
> Jeff Gibson wrote:
>> One thing your earlier suggestion brought up.  The way I was hooking up 
>> tables before was something along the lines of:
>>
>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, );
>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
>> CREATE TABLE link(id1 INTEGER, id2 INTEGER);
>>
>> My understanding of your suggestion is:
>>
>> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
>> );
>> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
>>
>> with the understanding that id2 in primary will often be NULL.  Are 
>> there any circumstances where the first alternative is more 
>> appropriate?  I'm pretty new to databases, but I got my original idea 
>> from a few schemas that I've seen.  I'm just trying to understand the 
>> trade-offs.
>> Thanks a lot for your help,
>> Jeff
>>
> 
> These different forms of linking the records are used for different 
> types of relations. The two tables can have records that are related in 
> a various combinations of one or many to one or many.
> 
>  one to one
>  many to one
>  one to many
>  many to many
> 
> Using a third table is required to implement a many to many relation. 
> Each record in the third table stores one item of the relation (i.e 
> which record in the first table is related to which record in the second 
> table).
> 
> A one to many relation is created by assigning an id to the record in 
> the one side of the relation and referencing that id in a column on the 
> many side of the relation. A many to one relation is the same a one to 
> many relation, with the order of the tables reversed. This is what you 
> have shown as Richard's suggestion.
> 
> A one to one relation can be created by assigning an id to one record 
> and using that same id as the primary key on the related record.
> 
> For your case, you need a one to one relation between the primary and 
> secondary tables. This can be done by using the same id for the related 
> record in the secondary table as was assigned to the record in the 
> primary table.
> 
> CREATE TABLE primary(id INTEGER PRIMARY KEY, );
> CREATE TABLE secondary(id INTEGER PRIMARY KEY, );
> 
> insert into primary values(null, );
> insert into secondary values(last_insert_rowid(), );
> 
> When you want to retrieve the records you can use a join
> 
> select * from primary join secondary using(id);
> 
> or you can use a second select to retrieve the secondary fields using 
> the id obtained from the primary field.
> 
> select * from primary;
> if (has_secondary())
>  select * from secondary where id = primary.id;
> 
> This does not waste any space storing unnecessary null fields. You 
> should only resort the more complex foreign keys when you need to 
> represent a more complex relation.
> 
> HTH
> Dennis Cote

As Dennis points out, I had assumed that the relationship between the
primary and secondary tables was many-to-one, i.e. that several entries
in the primary table could refer to the same entry in the secondary
table.

If that is not the case -- if the relationship is in fact one-to-one --
then Dennis's solution is the best one.

I would use Dennis's two-SELECT approach rather than the join if speed
is an issue.

- Richard
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-11 Thread Dennis Cote
Jeff Gibson wrote:
> 
> One thing your earlier suggestion brought up.  The way I was hooking up 
> tables before was something along the lines of:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, );
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
> CREATE TABLE link(id1 INTEGER, id2 INTEGER);
> 
> My understanding of your suggestion is:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
> );
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
> 
> with the understanding that id2 in primary will often be NULL.  Are 
> there any circumstances where the first alternative is more 
> appropriate?  I'm pretty new to databases, but I got my original idea 
> from a few schemas that I've seen.  I'm just trying to understand the 
> trade-offs.
> Thanks a lot for your help,
> Jeff
> 

These different forms of linking the records are used for different 
types of relations. The two tables can have records that are related in 
a various combinations of one or many to one or many.

 one to one
 many to one
 one to many
 many to many

Using a third table is required to implement a many to many relation. 
Each record in the third table stores one item of the relation (i.e 
which record in the first table is related to which record in the second 
table).

A one to many relation is created by assigning an id to the record in 
the one side of the relation and referencing that id in a column on the 
many side of the relation. A many to one relation is the same a one to 
many relation, with the order of the tables reversed. This is what you 
have shown as Richard's suggestion.

A one to one relation can be created by assigning an id to one record 
and using that same id as the primary key on the related record.

For your case, you need a one to one relation between the primary and 
secondary tables. This can be done by using the same id for the related 
record in the secondary table as was assigned to the record in the 
primary table.

CREATE TABLE primary(id INTEGER PRIMARY KEY, );
CREATE TABLE secondary(id INTEGER PRIMARY KEY, );

insert into primary values(null, );
insert into secondary values(last_insert_rowid(), );

When you want to retrieve the records you can use a join

select * from primary join secondary using(id);

or you can use a second select to retrieve the secondary fields using 
the id obtained from the primary field.

select * from primary;
if (has_secondary())
 select * from secondary where id = primary.id;

This does not waste any space storing unnecessary null fields. You 
should only resort the more complex foreign keys when you need to 
represent a more complex relation.

HTH
Dennis Cote
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-11 Thread Jay A. Kreibich
On Thu, Apr 10, 2008 at 05:58:59PM -0700, Richard Klein scratched on the wall:

> My advice would be to try it and see.  If table creation takes too long,
> you can always remove the UNIQUE constraint, and then write a routine to
> check the table for uniqueness after it's created.

  That "routine" could simply be creating an explicit unique index on
  the column after all the data is loaded.  If the index can be
  created, you're good to go.

  This is still a slow process, but I know the creation of indexes on
  existing tables is one area the development team hopes to speed up.
  I'm not sure what the priority of that is, however.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
Jeff -

I'm pretty new to databases myself, but I have seen examples of
schemas that resemble your first alternative, i.e. that involve
the creation of a third table containing just the linkages
between the primary and secondary tables.

And in fact, I think that is the right solution for your appli-
cation.  Although the link table duplicates the id1 column, it
contains no wasted (NULL) entries.

The second alternative (my proposal) doesn't duplicate id1, but
90% of the id2's are NULL.  Since the primary table is very large,
this represents an excessive waste of space.

- Richard


Jeff Gibson wrote:
> Right.  Hence my hesitation.  :-)  I suppose it's possible to check 
> uniqueness once at the end in O(N), but it would also take O(N) extra 
> storage, and I doubt sqlite is doing that...
> 
> One thing your earlier suggestion brought up.  The way I was hooking up 
> tables before was something along the lines of:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, );
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
> CREATE TABLE link(id1 INTEGER, id2 INTEGER);
> 
> My understanding of your suggestion is:
> 
> CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
> );
> CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
> 
> with the understanding that id2 in primary will often be NULL.  Are 
> there any circumstances where the first alternative is more 
> appropriate?  I'm pretty new to databases, but I got my original idea 
> from a few schemas that I've seen.  I'm just trying to understand the 
> trade-offs.
> Thanks a lot for your help,
> Jeff
> 
> Richard Klein wrote:
>> On second thought, checking the entire table for uniqueness would seem
>> to require O(N log N), regardless of whether it is done one INSERT at
>> a time, or all at once after the table is created!
>>
>> - Richard
>>
>> Richard Klein wrote:
>>   
>>> Quoting from the description of CREATE TABLE in the SQL Syntax section of
>>> the SQLite documentation:
>>>
>>> "The UNIQUE constraint causes an index to be created on the specified 
>>> columns.
>>> This index must contain unique keys."
>>>
>>> The creation of an index would seem to imply an O(log N) search on each
>>> insertion, so you should be okay.
>>>
>>> My advice would be to try it and see.  If table creation takes too long,
>>> you can always remove the UNIQUE constraint, and then write a routine to
>>> check the table for uniqueness after it's created.
>>>
>>> - Richard
>>>
>>>
>>> Jeff Gibson wrote:
>>> 
 I see.  Fortunately my application simplifies this since the database is 
 created once and read many times, but is never modified after creation 
 time.  Regarding constraints, I was thinking it might be helpful to add 
 a few where applicable (whether foreign key constraints or even simple 
 uniqueness constraints) basically as assertions, but I was worried about 
 the overhead it would take to enforce them when I'm creating the 
 database.  Do you know if a uniqueness constraint, for instance, does an 
 O(N) search on each insertion?  If so, it sounds prohibitive.
 Thanks,
 Jeff

 Richard Klein wrote:
   
> Jeff,
>
> I think that's the right way to go for your application.  There are a few
> things you should be aware of regarding this approach.
>
> A column in one table that references a column in another table is called
> a "foreign key" in database lingo.
>
> An issue with foreign keys is that it is important to keep the referencing
> table (big_table) in sync with the referenced table (secondary_table).
>
> For example, if you delete an entry from secondary_table, you want to 
> update
> the foreign key column in all entries in big_table that reference that 
> entry.
>
> What's the proper way to update the foreign key?  It depends on your 
> appli-
> cation.  You might want to set the foreign key in the referencing entries 
> to
> NULL, or you might want to delete the referencing entries, or you might 
> want
> to do something else.
>
> In standard, full-blown SQL, you can define the synchronization behavior 
> you
> want with a "foreign key constraint".  That is, you might create big_table
> as follows:
>
> CREATE TABLE big_table (
> idINTEGER PRIMARY KEY,
> col1  INTEGER,
> col2  REAL,
> col3  TEXT,
> col4  BLOB,
> col5  INTEGER,
> CONSTRAINT col5_fk FOREIGN KEY(col5)
>REFERENCES secondary_table(id) ON DELETE SET NULL,
> );
>
> This would define col5 as a foreign key referencing the id column of 
> secondary_
> table, and would specify that col5 should be set to NULL in all 
> referencing
> entries in big_table when an entry in secondary_table is deleted.
>
> Unfortunately, SQLite does not implement foreign key constraints.  More 
> precisely,
> they 

Re: [sqlite] schema design question

2008-04-10 Thread Jeff Gibson
Right.  Hence my hesitation.  :-)  I suppose it's possible to check 
uniqueness once at the end in O(N), but it would also take O(N) extra 
storage, and I doubt sqlite is doing that...

One thing your earlier suggestion brought up.  The way I was hooking up 
tables before was something along the lines of:

CREATE TABLE primary(id1 INTEGER PRIMARY KEY, );
CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );
CREATE TABLE link(id1 INTEGER, id2 INTEGER);

My understanding of your suggestion is:

CREATE TABLE primary(id1 INTEGER PRIMARY KEY, id2 INTEGER, 
);
CREATE TABLE secondary(id2 INTEGER PRIMARY KEY, );

with the understanding that id2 in primary will often be NULL.  Are 
there any circumstances where the first alternative is more 
appropriate?  I'm pretty new to databases, but I got my original idea 
from a few schemas that I've seen.  I'm just trying to understand the 
trade-offs.
Thanks a lot for your help,
Jeff

Richard Klein wrote:
> On second thought, checking the entire table for uniqueness would seem
> to require O(N log N), regardless of whether it is done one INSERT at
> a time, or all at once after the table is created!
>
> - Richard
>
> Richard Klein wrote:
>   
>> Quoting from the description of CREATE TABLE in the SQL Syntax section of
>> the SQLite documentation:
>>
>> "The UNIQUE constraint causes an index to be created on the specified 
>> columns.
>> This index must contain unique keys."
>>
>> The creation of an index would seem to imply an O(log N) search on each
>> insertion, so you should be okay.
>>
>> My advice would be to try it and see.  If table creation takes too long,
>> you can always remove the UNIQUE constraint, and then write a routine to
>> check the table for uniqueness after it's created.
>>
>> - Richard
>>
>>
>> Jeff Gibson wrote:
>> 
>>> I see.  Fortunately my application simplifies this since the database is 
>>> created once and read many times, but is never modified after creation 
>>> time.  Regarding constraints, I was thinking it might be helpful to add 
>>> a few where applicable (whether foreign key constraints or even simple 
>>> uniqueness constraints) basically as assertions, but I was worried about 
>>> the overhead it would take to enforce them when I'm creating the 
>>> database.  Do you know if a uniqueness constraint, for instance, does an 
>>> O(N) search on each insertion?  If so, it sounds prohibitive.
>>> Thanks,
>>> Jeff
>>>
>>> Richard Klein wrote:
>>>   
 Jeff,

 I think that's the right way to go for your application.  There are a few
 things you should be aware of regarding this approach.

 A column in one table that references a column in another table is called
 a "foreign key" in database lingo.

 An issue with foreign keys is that it is important to keep the referencing
 table (big_table) in sync with the referenced table (secondary_table).

 For example, if you delete an entry from secondary_table, you want to 
 update
 the foreign key column in all entries in big_table that reference that 
 entry.

 What's the proper way to update the foreign key?  It depends on your appli-
 cation.  You might want to set the foreign key in the referencing entries 
 to
 NULL, or you might want to delete the referencing entries, or you might 
 want
 to do something else.

 In standard, full-blown SQL, you can define the synchronization behavior 
 you
 want with a "foreign key constraint".  That is, you might create big_table
 as follows:

 CREATE TABLE big_table (
 idINTEGER PRIMARY KEY,
 col1  INTEGER,
 col2  REAL,
 col3  TEXT,
 col4  BLOB,
 col5  INTEGER,
 CONSTRAINT col5_fk FOREIGN KEY(col5)
REFERENCES secondary_table(id) ON DELETE SET NULL,
 );

 This would define col5 as a foreign key referencing the id column of 
 secondary_
 table, and would specify that col5 should be set to NULL in all referencing
 entries in big_table when an entry in secondary_table is deleted.

 Unfortunately, SQLite does not implement foreign key constraints.  More 
 precisely,
 they don't cause syntax errors, but they aren't enforced.  Therefore, you 
 will
 have to implement the desired synchronization behavior yourself.  
 Fortunately,
 this is easy to do with the use of TRIGGERs, which *are* implemented in 
 SQLite.

 Here are some links that might be useful:

 Foreign keys: http://en.wikipedia.org/wiki/Foreign_key
 SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html

 Hope this helps,
 - Richard

   
 
> Thanks!  I'll give that a try.
>Jeff
>
> Richard Klein wrote:
> 
>   
>>> Whether or not the the secondary columns are needed is a function of 
>>> one 
>>> of the primary columns.  That 

Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
On second thought, checking the entire table for uniqueness would seem
to require O(N log N), regardless of whether it is done one INSERT at
a time, or all at once after the table is created!

- Richard

Richard Klein wrote:
> Quoting from the description of CREATE TABLE in the SQL Syntax section of
> the SQLite documentation:
> 
> "The UNIQUE constraint causes an index to be created on the specified columns.
> This index must contain unique keys."
> 
> The creation of an index would seem to imply an O(log N) search on each
> insertion, so you should be okay.
> 
> My advice would be to try it and see.  If table creation takes too long,
> you can always remove the UNIQUE constraint, and then write a routine to
> check the table for uniqueness after it's created.
> 
> - Richard
> 
> 
> Jeff Gibson wrote:
>> I see.  Fortunately my application simplifies this since the database is 
>> created once and read many times, but is never modified after creation 
>> time.  Regarding constraints, I was thinking it might be helpful to add 
>> a few where applicable (whether foreign key constraints or even simple 
>> uniqueness constraints) basically as assertions, but I was worried about 
>> the overhead it would take to enforce them when I'm creating the 
>> database.  Do you know if a uniqueness constraint, for instance, does an 
>> O(N) search on each insertion?  If so, it sounds prohibitive.
>> Thanks,
>> Jeff
>>
>> Richard Klein wrote:
>>> Jeff,
>>>
>>> I think that's the right way to go for your application.  There are a few
>>> things you should be aware of regarding this approach.
>>>
>>> A column in one table that references a column in another table is called
>>> a "foreign key" in database lingo.
>>>
>>> An issue with foreign keys is that it is important to keep the referencing
>>> table (big_table) in sync with the referenced table (secondary_table).
>>>
>>> For example, if you delete an entry from secondary_table, you want to update
>>> the foreign key column in all entries in big_table that reference that 
>>> entry.
>>>
>>> What's the proper way to update the foreign key?  It depends on your appli-
>>> cation.  You might want to set the foreign key in the referencing entries to
>>> NULL, or you might want to delete the referencing entries, or you might want
>>> to do something else.
>>>
>>> In standard, full-blown SQL, you can define the synchronization behavior you
>>> want with a "foreign key constraint".  That is, you might create big_table
>>> as follows:
>>>
>>> CREATE TABLE big_table (
>>> idINTEGER PRIMARY KEY,
>>> col1  INTEGER,
>>> col2  REAL,
>>> col3  TEXT,
>>> col4  BLOB,
>>> col5  INTEGER,
>>> CONSTRAINT col5_fk FOREIGN KEY(col5)
>>>REFERENCES secondary_table(id) ON DELETE SET NULL,
>>> );
>>>
>>> This would define col5 as a foreign key referencing the id column of 
>>> secondary_
>>> table, and would specify that col5 should be set to NULL in all referencing
>>> entries in big_table when an entry in secondary_table is deleted.
>>>
>>> Unfortunately, SQLite does not implement foreign key constraints.  More 
>>> precisely,
>>> they don't cause syntax errors, but they aren't enforced.  Therefore, you 
>>> will
>>> have to implement the desired synchronization behavior yourself.  
>>> Fortunately,
>>> this is easy to do with the use of TRIGGERs, which *are* implemented in 
>>> SQLite.
>>>
>>> Here are some links that might be useful:
>>>
>>> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key
>>> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>>>
>>> Hope this helps,
>>> - Richard
>>>
>>>   
 Thanks!  I'll give that a try.
Jeff

 Richard Klein wrote:
 
>> Whether or not the the secondary columns are needed is a function of one 
>> of the primary columns.  That function involves values from another 
>> table, though, so the general case would require a join.  That other 
>> table is small, however, so I generally cache it outside the database.  
>> Some pseudocode for my expected use would be something like:
>>
>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>> while(step()) {
>>
>>if( F(primary_column_values) ) {
>>   Fetch secondary values
>>   }
>>
>> do something with primary and maybe secondary values;
>>
>> }
>>
>> Where F would be implemented outside the database.
>> Thanks,
>> Jeff
>> 
>> 
> I assume that the primary SELECT shown above can be made suitably fast
> by creating the appropriate indices on big_table.
>
> If the secondary columns are kept in a separate, secondary_table, and
> a fifth primary column is added that contains the ROWID of the approp-
> riate entry in the secondary_table (or NULL if the secondary_table is
> not needed), then the "Fetch secondary values" operation should be very
> fast as well.

Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
Quoting from the description of CREATE TABLE in the SQL Syntax section of
the SQLite documentation:

"The UNIQUE constraint causes an index to be created on the specified columns.
This index must contain unique keys."

The creation of an index would seem to imply an O(log N) search on each
insertion, so you should be okay.

My advice would be to try it and see.  If table creation takes too long,
you can always remove the UNIQUE constraint, and then write a routine to
check the table for uniqueness after it's created.

- Richard


Jeff Gibson wrote:
> I see.  Fortunately my application simplifies this since the database is 
> created once and read many times, but is never modified after creation 
> time.  Regarding constraints, I was thinking it might be helpful to add 
> a few where applicable (whether foreign key constraints or even simple 
> uniqueness constraints) basically as assertions, but I was worried about 
> the overhead it would take to enforce them when I'm creating the 
> database.  Do you know if a uniqueness constraint, for instance, does an 
> O(N) search on each insertion?  If so, it sounds prohibitive.
> Thanks,
> Jeff
> 
> Richard Klein wrote:
>> Jeff,
>>
>> I think that's the right way to go for your application.  There are a few
>> things you should be aware of regarding this approach.
>>
>> A column in one table that references a column in another table is called
>> a "foreign key" in database lingo.
>>
>> An issue with foreign keys is that it is important to keep the referencing
>> table (big_table) in sync with the referenced table (secondary_table).
>>
>> For example, if you delete an entry from secondary_table, you want to update
>> the foreign key column in all entries in big_table that reference that entry.
>>
>> What's the proper way to update the foreign key?  It depends on your appli-
>> cation.  You might want to set the foreign key in the referencing entries to
>> NULL, or you might want to delete the referencing entries, or you might want
>> to do something else.
>>
>> In standard, full-blown SQL, you can define the synchronization behavior you
>> want with a "foreign key constraint".  That is, you might create big_table
>> as follows:
>>
>> CREATE TABLE big_table (
>> idINTEGER PRIMARY KEY,
>> col1  INTEGER,
>> col2  REAL,
>> col3  TEXT,
>> col4  BLOB,
>> col5  INTEGER,
>> CONSTRAINT col5_fk FOREIGN KEY(col5)
>>REFERENCES secondary_table(id) ON DELETE SET NULL,
>> );
>>
>> This would define col5 as a foreign key referencing the id column of 
>> secondary_
>> table, and would specify that col5 should be set to NULL in all referencing
>> entries in big_table when an entry in secondary_table is deleted.
>>
>> Unfortunately, SQLite does not implement foreign key constraints.  More 
>> precisely,
>> they don't cause syntax errors, but they aren't enforced.  Therefore, you 
>> will
>> have to implement the desired synchronization behavior yourself.  
>> Fortunately,
>> this is easy to do with the use of TRIGGERs, which *are* implemented in 
>> SQLite.
>>
>> Here are some links that might be useful:
>>
>> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key
>> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>>
>> Hope this helps,
>> - Richard
>>
>>   
>>> Thanks!  I'll give that a try.
>>>Jeff
>>>
>>> Richard Klein wrote:
>>> 
> Whether or not the the secondary columns are needed is a function of one 
> of the primary columns.  That function involves values from another 
> table, though, so the general case would require a join.  That other 
> table is small, however, so I generally cache it outside the database.  
> Some pseudocode for my expected use would be something like:
>
> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
> while(step()) {
>
>if( F(primary_column_values) ) {
>   Fetch secondary values
>   }
>
> do something with primary and maybe secondary values;
>
> }
>
> Where F would be implemented outside the database.
> Thanks,
> Jeff
> 
> 
 I assume that the primary SELECT shown above can be made suitably fast
 by creating the appropriate indices on big_table.

 If the secondary columns are kept in a separate, secondary_table, and
 a fifth primary column is added that contains the ROWID of the approp-
 riate entry in the secondary_table (or NULL if the secondary_table is
 not needed), then the "Fetch secondary values" operation should be very
 fast as well.

 It seems to me that this approach would be faster than a join, and
 would consume less space than an 8-column table containing mostly
 NULLs in the secondary columns.

 Of course, this approach would cost you some extra space, in the form
 of the 5th primary column containing the secondary ROWID.

 - Richard Klein

 

Re: [sqlite] schema design question

2008-04-10 Thread Jeff Gibson
I see.  Fortunately my application simplifies this since the database is 
created once and read many times, but is never modified after creation 
time.  Regarding constraints, I was thinking it might be helpful to add 
a few where applicable (whether foreign key constraints or even simple 
uniqueness constraints) basically as assertions, but I was worried about 
the overhead it would take to enforce them when I'm creating the 
database.  Do you know if a uniqueness constraint, for instance, does an 
O(N) search on each insertion?  If so, it sounds prohibitive.
Thanks,
Jeff

Richard Klein wrote:
> Jeff,
>
> I think that's the right way to go for your application.  There are a few
> things you should be aware of regarding this approach.
>
> A column in one table that references a column in another table is called
> a "foreign key" in database lingo.
>
> An issue with foreign keys is that it is important to keep the referencing
> table (big_table) in sync with the referenced table (secondary_table).
>
> For example, if you delete an entry from secondary_table, you want to update
> the foreign key column in all entries in big_table that reference that entry.
>
> What's the proper way to update the foreign key?  It depends on your appli-
> cation.  You might want to set the foreign key in the referencing entries to
> NULL, or you might want to delete the referencing entries, or you might want
> to do something else.
>
> In standard, full-blown SQL, you can define the synchronization behavior you
> want with a "foreign key constraint".  That is, you might create big_table
> as follows:
>
> CREATE TABLE big_table (
> idINTEGER PRIMARY KEY,
> col1  INTEGER,
> col2  REAL,
> col3  TEXT,
> col4  BLOB,
> col5  INTEGER,
> CONSTRAINT col5_fk FOREIGN KEY(col5)
>REFERENCES secondary_table(id) ON DELETE SET NULL,
> );
>
> This would define col5 as a foreign key referencing the id column of 
> secondary_
> table, and would specify that col5 should be set to NULL in all referencing
> entries in big_table when an entry in secondary_table is deleted.
>
> Unfortunately, SQLite does not implement foreign key constraints.  More 
> precisely,
> they don't cause syntax errors, but they aren't enforced.  Therefore, you will
> have to implement the desired synchronization behavior yourself.  Fortunately,
> this is easy to do with the use of TRIGGERs, which *are* implemented in 
> SQLite.
>
> Here are some links that might be useful:
>
> Foreign keys: http://en.wikipedia.org/wiki/Foreign_key
> SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html
>
> Hope this helps,
> - Richard
>
>   
>> Thanks!  I'll give that a try.
>>Jeff
>>
>> Richard Klein wrote:
>> 
 Whether or not the the secondary columns are needed is a function of one 
 of the primary columns.  That function involves values from another 
 table, though, so the general case would require a join.  That other 
 table is small, however, so I generally cache it outside the database.  
 Some pseudocode for my expected use would be something like:

 prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
 while(step()) {

if( F(primary_column_values) ) {
   Fetch secondary values
   }

 do something with primary and maybe secondary values;

 }

 Where F would be implemented outside the database.
 Thanks,
 Jeff
 
 
>>> I assume that the primary SELECT shown above can be made suitably fast
>>> by creating the appropriate indices on big_table.
>>>
>>> If the secondary columns are kept in a separate, secondary_table, and
>>> a fifth primary column is added that contains the ROWID of the approp-
>>> riate entry in the secondary_table (or NULL if the secondary_table is
>>> not needed), then the "Fetch secondary values" operation should be very
>>> fast as well.
>>>
>>> It seems to me that this approach would be faster than a join, and
>>> would consume less space than an 8-column table containing mostly
>>> NULLs in the secondary columns.
>>>
>>> Of course, this approach would cost you some extra space, in the form
>>> of the 5th primary column containing the secondary ROWID.
>>>
>>> - Richard Klein
>>>
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>   
>>>   
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org

Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
Jeff,

I think that's the right way to go for your application.  There are a few
things you should be aware of regarding this approach.

A column in one table that references a column in another table is called
a "foreign key" in database lingo.

An issue with foreign keys is that it is important to keep the referencing
table (big_table) in sync with the referenced table (secondary_table).

For example, if you delete an entry from secondary_table, you want to update
the foreign key column in all entries in big_table that reference that entry.

What's the proper way to update the foreign key?  It depends on your appli-
cation.  You might want to set the foreign key in the referencing entries to
NULL, or you might want to delete the referencing entries, or you might want
to do something else.

In standard, full-blown SQL, you can define the synchronization behavior you
want with a "foreign key constraint".  That is, you might create big_table
as follows:

CREATE TABLE big_table (
idINTEGER PRIMARY KEY,
col1  INTEGER,
col2  REAL,
col3  TEXT,
col4  BLOB,
col5  INTEGER,
CONSTRAINT col5_fk FOREIGN KEY(col5)
   REFERENCES secondary_table(id) ON DELETE SET NULL,
);

This would define col5 as a foreign key referencing the id column of secondary_
table, and would specify that col5 should be set to NULL in all referencing
entries in big_table when an entry in secondary_table is deleted.

Unfortunately, SQLite does not implement foreign key constraints.  More 
precisely,
they don't cause syntax errors, but they aren't enforced.  Therefore, you will
have to implement the desired synchronization behavior yourself.  Fortunately,
this is easy to do with the use of TRIGGERs, which *are* implemented in SQLite.

Here are some links that might be useful:

Foreign keys: http://en.wikipedia.org/wiki/Foreign_key
SQLite triggers:  http://www.sqlite.org/lang_createtrigger.html

Hope this helps,
- Richard

> Thanks!  I'll give that a try.
>Jeff
> 
> Richard Klein wrote:
>>> Whether or not the the secondary columns are needed is a function of one 
>>> of the primary columns.  That function involves values from another 
>>> table, though, so the general case would require a join.  That other 
>>> table is small, however, so I generally cache it outside the database.  
>>> Some pseudocode for my expected use would be something like:
>>>
>>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>>> while(step()) {
>>>
>>>if( F(primary_column_values) ) {
>>>   Fetch secondary values
>>>   }
>>>
>>> do something with primary and maybe secondary values;
>>>
>>> }
>>>
>>> Where F would be implemented outside the database.
>>> Thanks,
>>> Jeff
>>> 
>> I assume that the primary SELECT shown above can be made suitably fast
>> by creating the appropriate indices on big_table.
>>
>> If the secondary columns are kept in a separate, secondary_table, and
>> a fifth primary column is added that contains the ROWID of the approp-
>> riate entry in the secondary_table (or NULL if the secondary_table is
>> not needed), then the "Fetch secondary values" operation should be very
>> fast as well.
>>
>> It seems to me that this approach would be faster than a join, and
>> would consume less space than an 8-column table containing mostly
>> NULLs in the secondary columns.
>>
>> Of course, this approach would cost you some extra space, in the form
>> of the 5th primary column containing the secondary ROWID.
>>
>> - Richard Klein
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>   
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-10 Thread Jeff Gibson
Thanks!  I'll give that a try.
   Jeff

Richard Klein wrote:
>> Whether or not the the secondary columns are needed is a function of one 
>> of the primary columns.  That function involves values from another 
>> table, though, so the general case would require a join.  That other 
>> table is small, however, so I generally cache it outside the database.  
>> Some pseudocode for my expected use would be something like:
>>
>> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
>> while(step()) {
>>
>>if( F(primary_column_values) ) {
>>   Fetch secondary values
>>   }
>>
>> do something with primary and maybe secondary values;
>>
>> }
>>
>> Where F would be implemented outside the database.
>> Thanks,
>> Jeff
>> 
>
> I assume that the primary SELECT shown above can be made suitably fast
> by creating the appropriate indices on big_table.
>
> If the secondary columns are kept in a separate, secondary_table, and
> a fifth primary column is added that contains the ROWID of the approp-
> riate entry in the secondary_table (or NULL if the secondary_table is
> not needed), then the "Fetch secondary values" operation should be very
> fast as well.
>
> It seems to me that this approach would be faster than a join, and
> would consume less space than an 8-column table containing mostly
> NULLs in the secondary columns.
>
> Of course, this approach would cost you some extra space, in the form
> of the 5th primary column containing the secondary ROWID.
>
> - Richard Klein
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
> Whether or not the the secondary columns are needed is a function of one 
> of the primary columns.  That function involves values from another 
> table, though, so the general case would require a join.  That other 
> table is small, however, so I generally cache it outside the database.  
> Some pseudocode for my expected use would be something like:
> 
> prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
> while(step()) {
> 
>if( F(primary_column_values) ) {
>   Fetch secondary values
>   }
> 
> do something with primary and maybe secondary values;
> 
> }
> 
> Where F would be implemented outside the database.
> Thanks,
> Jeff

I assume that the primary SELECT shown above can be made suitably fast
by creating the appropriate indices on big_table.

If the secondary columns are kept in a separate, secondary_table, and
a fifth primary column is added that contains the ROWID of the approp-
riate entry in the secondary_table (or NULL if the secondary_table is
not needed), then the "Fetch secondary values" operation should be very
fast as well.

It seems to me that this approach would be faster than a join, and
would consume less space than an 8-column table containing mostly
NULLs in the secondary columns.

Of course, this approach would cost you some extra space, in the form
of the 5th primary column containing the secondary ROWID.

- Richard Klein

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-10 Thread Jeff Gibson
Whether or not the the secondary columns are needed is a function of one 
of the primary columns.  That function involves values from another 
table, though, so the general case would require a join.  That other 
table is small, however, so I generally cache it outside the database.  
Some pseudocode for my expected use would be something like:

prepare("SELECT primary_columns FROM big_table WHERE some_criterion")
while(step()) {

   if( F(primary_column_values) ) {
  Fetch secondary values
  }

do something with primary and maybe secondary values;

}

Where F would be implemented outside the database.
Thanks,
Jeff



Richard Klein wrote:
>>   I'm pretty new to databases, and I have a schema design question.  I 
>> don't know enough about the guts of how sqlite works to know how to make 
>> some tradeoffs.  I have a large (potentially millions of entries) table 
>> and it has 4 columns which are needed for every entry, and 4 more that 
>> are needed for about 10% of the entries.  I'm trying to decide whether I 
>> want one table with 8 columns with a bunch of NULLs or two tables with 
>> no NULLs that will require a join to get all of the 8 column values.  I 
>> assume this is a space/performance tradeoff, since I would think 
>> searching one table would be a lot faster than doing a join, but I'm not 
>> sure what the impact would be in terms of disk/memory/performance of all 
>> those NULLs.
>>Does anybody have any suggestions?
>>Thanks,
>>Jeff
>> 
>
> Can you give us a little more information?  Specifically, is there any
> way to tell, by looking at the 4 primary columns, that you are dealing
> with one of the 10% entries that requires looking at the 4 secondary
> columns?
>
> - RichardKlein
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>   

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] schema design question

2008-04-10 Thread Richard Klein
>   I'm pretty new to databases, and I have a schema design question.  I 
> don't know enough about the guts of how sqlite works to know how to make 
> some tradeoffs.  I have a large (potentially millions of entries) table 
> and it has 4 columns which are needed for every entry, and 4 more that 
> are needed for about 10% of the entries.  I'm trying to decide whether I 
> want one table with 8 columns with a bunch of NULLs or two tables with 
> no NULLs that will require a join to get all of the 8 column values.  I 
> assume this is a space/performance tradeoff, since I would think 
> searching one table would be a lot faster than doing a join, but I'm not 
> sure what the impact would be in terms of disk/memory/performance of all 
> those NULLs.
>Does anybody have any suggestions?
>Thanks,
>Jeff

Can you give us a little more information?  Specifically, is there any
way to tell, by looking at the 4 primary columns, that you are dealing
with one of the 10% entries that requires looking at the 4 secondary
columns?

- RichardKlein

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] schema design question

2008-04-10 Thread Jeff Gibson
  I'm pretty new to databases, and I have a schema design question.  I 
don't know enough about the guts of how sqlite works to know how to make 
some tradeoffs.  I have a large (potentially millions of entries) table 
and it has 4 columns which are needed for every entry, and 4 more that 
are needed for about 10% of the entries.  I'm trying to decide whether I 
want one table with 8 columns with a bunch of NULLs or two tables with 
no NULLs that will require a join to get all of the 8 column values.  I 
assume this is a space/performance tradeoff, since I would think 
searching one table would be a lot faster than doing a join, but I'm not 
sure what the impact would be in terms of disk/memory/performance of all 
those NULLs.
   Does anybody have any suggestions?
   Thanks,
   Jeff
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users