Re: [sqlite] seeking database design advice

2012-03-01 Thread Kees Nuyt
On Fri, 2 Mar 2012 13:26:23 +1100, BareFeetWare
 wrote:

> Hi Rich,
>
>> I wanted to solicit some opinions about how best to layout a database table 
>> for a project I am working on.  I have a distributed application that 
>> reports run time information back to a central machine.  Included in the 
>> report are the parameters used during runtime... approx 25 or so.  
>> Parameters are site specific and may be changed locally by whomever runs the 
>> application.  I have a table that records an application instance (columns 
>> such as location, version etc), and a table that records report instances 
>> (time, date, size).  I wanted to be able to query at a glance the most 
>> recent parameter set in use, so I came up with the following table design...
>> 
>> Design A)
>> primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25
>
> so that would be:

[snipped for brevity]

> Note that this will only give you results that
> actually exist, no extra nulls etc that aren't
> needed. It will be listed as rows, not columns,
> which is far more manageable and predictable.
>
>> I realize design A may not be "best", but I
>> would prefer an friendly answer and not a
>> canned response like I get from co-workers.


> Hopefully this was friendly and leads you towards "enlightenment" ;-)
>
> Tom
>
>Tom Brodhurst-Hill
>BareFeetWare

Excellent analysis!

-- 
Regards,

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking database design advice

2012-03-01 Thread Simon Slavin

On 2 Mar 2012, at 3:44am, Mario Becroft  wrote:

> Making schema changes, such as adding or removing attributes, is not
> necessarily difficult. This is what DML is for. However one thing SQL
> does not handle well is temporal schemas, i.e. ones in effect during
> different periods of time. The main reason why you might need to use
> design B instead of A in this case is if you need to be aware of which
> fields were valid at past times or will be valid in the future; for
> instance, if you remove a field but you do not wish to lose the
> information previously reported while that field existed, or more
> subtly, if after adding a new field, you need to know that past reports
> explicitly did not include that field.

Reports change.  Data is eternal.  Your schema should reflect the data stored 
in it, not the format of the report you happen to want at the moment.

The people talking to you, Rich, were right.  But they were rubbish at 
explaining why.  One hint is that there's nothing special about the number 25.  
So any schema that allows you to immediately identify 25 should be treated with 
suspicion.  Use the two-table design, and don't forget SQLite's group_concat(X) 
function:

http://www.sqlite.org/lang_aggfunc.html

which allows you to gather all the data for one row of your report in one long 
string, as long as the structure you store it in is well thought out.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking database design advice

2012-03-01 Thread Mario Becroft
I agree with what has been said so far regarding normalization, getting
the underlying structure right, and using views to access the data. I
would add a couple of points:

Database refactoring is not necessarily as hard as has been
suggested. In fact, a good design, separating data storage (tables) and
data access (views) will not only minimize the need for future changes,
but make such changes easy to do when required. For instance you can
change the tables while retaining the same views, or vice versa. Writing
queries to migrate data from old to new schemas without interrupting
operations (on databases suitably designed to enable this) is every-day
work for DBA's.

Normalizing your data is a good idea, but the question really is what is
normalized in this case?

Treating the set of fields and their names as data (as in your design B)
may be the best way. However, were this mode of reasoning taken too far,
you might end up with a database having no structure at all, besides
that implied by the data stored in it (which is open to
interpretation). The whole point of SQL is that it provides tools to
help specify the structure of your data.

Making schema changes, such as adding or removing attributes, is not
necessarily difficult. This is what DML is for. However one thing SQL
does not handle well is temporal schemas, i.e. ones in effect during
different periods of time. The main reason why you might need to use
design B instead of A in this case is if you need to be aware of which
fields were valid at past times or will be valid in the future; for
instance, if you remove a field but you do not wish to lose the
information previously reported while that field existed, or more
subtly, if after adding a new field, you need to know that past reports
explicitly did not include that field.

There are also other reasons why the field names seem more like data
than schema in this case and I would agree that, based on the
information provided so far, design B is almost certainly better. I am
just making the point that the question is a little more subtle than the
people saying 'you must normalize' have implied, and you are right to
try and better understand the reasoning behind that opinion.

-- 
Mario Becroft 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] seeking database design advice

2012-03-01 Thread BareFeetWare
Hi Rich,

> I wanted to solicit some opinions about how best to layout a database table 
> for a project I am working on.  I have a distributed application that reports 
> run time information back to a central machine.  Included in the report are 
> the parameters used during runtime... approx 25 or so.  Parameters are site 
> specific and may be changed locally by whomever runs the application.  I have 
> a table that records an application instance (columns such as location, 
> version etc), and a table that records report instances (time, date, size).  
> I wanted to be able to query at a glance the most recent parameter set in 
> use, so I came up with the following table design...
> 
> Design A)
> primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25

so that would be:

create table "Application Instance"
(   appInstanceId integer primary key not null
,   name text not null collate nocase
,   location text not null collate nocase
,   version text not null collate nocase
)
;
create table "Report"
(   reportId integer primary key not null
,   datetime real not null
,   size integer
)
;
create table "Runtime"
(   primaryKey integer primary key not null
,   appInstanceId integer not null
,   reportId integer not null
,   param1
,   param2
,   param3
,   param4
,   param5
,   param6
,   param7
,   param8
,   param10
,   param11
,   param12
,   param13
,   param14
,   param15
,   param16
,   param17
,   param18
,   param19
,   param20
,   param21
,   param22
,   param23
,   param24
,   param25
)

> I spoke to a few people at work, and they all say the same thing... "this 
> isn't normalized'.  I get the following suggestion:
> 
> Design B)
> primaryKey | appInstanceId | reportId |paramNum |paramValue
> ---with a table second table--
> paramNum | paramDescription

Which would be:

create table "Application Instance"
(   appInstanceId integer primary key not null
,   name text not null collate nocase
,   location text not null collate nocase
,   version text not null collate nocase
)
;
create table "Report"
(   reportId integer primary key not null
,   datetime real not null
,   size integer
)
;
create table "Param"
(   paramNum integer primary key not null
,   paramDescription text
)
;
create table "Runtime"
(   primaryKey integer primary key not null
,   appInstanceId integer not null
references "Application Instance" (appInstanceId) on delete 
cascade on update cascade
,   reportId integer not null
references "Report" (reportId) on delete cascade on update 
cascade
,   paramNum integer not null
references "Param" (paramNum) on delete restrict on update 
cascade
,   paramValue not null
)
;

> The reason for Design B:
>   New parameters are easily added (no schema change)
>   It is "Better Design" (throwing around the word normalization a lot)

Agreed.

> My reasons for Design A:
>   1:1 record to report (not 1:25 records)
>   meta data overhead per row is 1/9 (opposed to 4/1)

Those aren't problems. You can use views and other mechanisms to view the data 
in whatever layout you like (so you don't have to manually cross reference). 
But the underlying data should be stored efficiently and, yes, normalized.

> I'm not a database guru, so I can't fight back...but I feel I am hearing a 
> line straight from a textbook.  Critics of my design simply tells me it is 
> wrong and keep repeating the mantra "It is not normalized".

Yes, that is frustrating. I've been there, years ago, even on this list, had 
"normalized" thrown at me without explanation. The best thing to do is research 
what normalized means. I am a convert now ;-) The "Definitive Guide to SQLite" 
book is a good place to start, in the chapter on "Normalization" (though read 
the previous pages first). In part it states:

>> Normalization concerns itself with the organization of attributes within 
>> relations so as to minimize duplication of data. Data duplication, as you 
>> will see, has more deleterious effects than just taking up unnecessary 
>> space. It increases the opportunity for database inconsistencies to arise. 
>> Normalization is about designing your database to be more resistant to the 
>> ill effects of thoughtless users and buggy programs. 

>> As stated, the chief aim of normalization is to eradicate duplication. 
>> Relations that have duplication removed are said to be normalized. However, 
>> there are degrees of normalization. These degrees are called normal forms.

>> First normal form simply states that all attributes in a relation use 
>> domains that are made up of atomic values..., meaning simply “that which 
>> cannot be broken down further.”

>> To understand second and third normal form, you have to first understand 
>> functional dependencies. The simple 

Re: [sqlite] seeking database design advice

2012-03-01 Thread Petite Abeille

On Mar 2, 2012, at 1:28 AM, Rich Rattanni wrote:

> I realize design A may not be "best", but I would prefer an friendly
> answer and not a canned response like I get from co-workers.

[warning: canned answer ahead]

FWIW, there is some good wisdom in going down the "text book" way, aka 
normalization :)

http://en.wikipedia.org/wiki/Database_normalization

On the other hand, if your main goal is reporting, then perhaps a different 
form of modelization might be more effective:

http://en.wikipedia.org/wiki/Dimensional_modeling

Either way, applying a bit of method (the "text book" part) has its benefits.

 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] seeking database design advice

2012-03-01 Thread Rich Rattanni
All:

I wanted to solicit some opinions about how best to layout a database
table for a project I am working on.  I have a distributed application
that reports run time information back to a central machine.  Included
in the report are the parameters used during runtime... approx 25 or
so.  Parameters are site specific and may be changed locally by
whomever runs the application.  I have a table that records an
application instance (columns such as location, version etc), and a
table that records report instances (time, date, size).  I wanted to
be able to query at a glance the most recent parameter set in use, so
I came up with the following table design...

Design A)
primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25

I spoke to a few people at work, and they all say the same thing...
"this isn't normalized'.  I get the following suggestion:

Design B)
primaryKey | appInstanceId | reportId |paramNum |paramValue
---with a table second table--
paramNum | paramDescription

The reason for Design B:
   New parameters are easily added (no schema change)
   It is "Better Design" (throwing around the word normalization a lot)

My reasons for Design A:
   1:1 record to report (not 1:25 records)
   meta data overhead per row is 1/9 (opposed to 4/1)

I'm not a database guru, so I can't fight back...but I feel I am
hearing a line straight from a textbook.  Critics of my design simply
tells me it is wrong and keep repeating the mantra "It is not
normalized".  The way I see it the column itself describes the data,
which means less data is stored... efficient and simple.
I do see the appeal of Design A by effectively 'future-proofing' the
database table.  My application performing the inserts should not have
to change, I just always insert the parameters I find in each report.

I should add that I do expect future parameters to be added (maybe
5,10 more at most, but rather infrequently).  Old and new versions
will co-exist, but eventually all application instances should be
updated.  With Design A I would add new columns and set existing
records to NULL.  I would expect 100,000 reports a month or so.  When
I select from parameters I generally will show all parameters, so my
queries become "select * from tbl where appid=x and reportId=y" with 1
result set necessary (not iterating over 25 results).

I realize design A may not be "best", but I would prefer an friendly
answer and not a canned response like I get from co-workers.

Thanks in advance,
Rich
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users