Re: [sqlite] seeking database design advice
On Fri, 2 Mar 2012 13:26:23 +1100, BareFeetWarewrote: > Hi Rich, > >> I wanted to solicit some opinions about how best to layout a database table >> for a project I am working on. I have a distributed application that >> reports run time information back to a central machine. Included in the >> report are the parameters used during runtime... approx 25 or so. >> Parameters are site specific and may be changed locally by whomever runs the >> application. I have a table that records an application instance (columns >> such as location, version etc), and a table that records report instances >> (time, date, size). I wanted to be able to query at a glance the most >> recent parameter set in use, so I came up with the following table design... >> >> Design A) >> primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25 > > so that would be: [snipped for brevity] > Note that this will only give you results that > actually exist, no extra nulls etc that aren't > needed. It will be listed as rows, not columns, > which is far more manageable and predictable. > >> I realize design A may not be "best", but I >> would prefer an friendly answer and not a >> canned response like I get from co-workers. > Hopefully this was friendly and leads you towards "enlightenment" ;-) > > Tom > >Tom Brodhurst-Hill >BareFeetWare Excellent analysis! -- Regards, Kees Nuyt ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking database design advice
On 2 Mar 2012, at 3:44am, Mario Becroftwrote: > Making schema changes, such as adding or removing attributes, is not > necessarily difficult. This is what DML is for. However one thing SQL > does not handle well is temporal schemas, i.e. ones in effect during > different periods of time. The main reason why you might need to use > design B instead of A in this case is if you need to be aware of which > fields were valid at past times or will be valid in the future; for > instance, if you remove a field but you do not wish to lose the > information previously reported while that field existed, or more > subtly, if after adding a new field, you need to know that past reports > explicitly did not include that field. Reports change. Data is eternal. Your schema should reflect the data stored in it, not the format of the report you happen to want at the moment. The people talking to you, Rich, were right. But they were rubbish at explaining why. One hint is that there's nothing special about the number 25. So any schema that allows you to immediately identify 25 should be treated with suspicion. Use the two-table design, and don't forget SQLite's group_concat(X) function: http://www.sqlite.org/lang_aggfunc.html which allows you to gather all the data for one row of your report in one long string, as long as the structure you store it in is well thought out. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking database design advice
I agree with what has been said so far regarding normalization, getting the underlying structure right, and using views to access the data. I would add a couple of points: Database refactoring is not necessarily as hard as has been suggested. In fact, a good design, separating data storage (tables) and data access (views) will not only minimize the need for future changes, but make such changes easy to do when required. For instance you can change the tables while retaining the same views, or vice versa. Writing queries to migrate data from old to new schemas without interrupting operations (on databases suitably designed to enable this) is every-day work for DBA's. Normalizing your data is a good idea, but the question really is what is normalized in this case? Treating the set of fields and their names as data (as in your design B) may be the best way. However, were this mode of reasoning taken too far, you might end up with a database having no structure at all, besides that implied by the data stored in it (which is open to interpretation). The whole point of SQL is that it provides tools to help specify the structure of your data. Making schema changes, such as adding or removing attributes, is not necessarily difficult. This is what DML is for. However one thing SQL does not handle well is temporal schemas, i.e. ones in effect during different periods of time. The main reason why you might need to use design B instead of A in this case is if you need to be aware of which fields were valid at past times or will be valid in the future; for instance, if you remove a field but you do not wish to lose the information previously reported while that field existed, or more subtly, if after adding a new field, you need to know that past reports explicitly did not include that field. There are also other reasons why the field names seem more like data than schema in this case and I would agree that, based on the information provided so far, design B is almost certainly better. I am just making the point that the question is a little more subtle than the people saying 'you must normalize' have implied, and you are right to try and better understand the reasoning behind that opinion. -- Mario Becroft___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] seeking database design advice
Hi Rich, > I wanted to solicit some opinions about how best to layout a database table > for a project I am working on. I have a distributed application that reports > run time information back to a central machine. Included in the report are > the parameters used during runtime... approx 25 or so. Parameters are site > specific and may be changed locally by whomever runs the application. I have > a table that records an application instance (columns such as location, > version etc), and a table that records report instances (time, date, size). > I wanted to be able to query at a glance the most recent parameter set in > use, so I came up with the following table design... > > Design A) > primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25 so that would be: create table "Application Instance" ( appInstanceId integer primary key not null , name text not null collate nocase , location text not null collate nocase , version text not null collate nocase ) ; create table "Report" ( reportId integer primary key not null , datetime real not null , size integer ) ; create table "Runtime" ( primaryKey integer primary key not null , appInstanceId integer not null , reportId integer not null , param1 , param2 , param3 , param4 , param5 , param6 , param7 , param8 , param10 , param11 , param12 , param13 , param14 , param15 , param16 , param17 , param18 , param19 , param20 , param21 , param22 , param23 , param24 , param25 ) > I spoke to a few people at work, and they all say the same thing... "this > isn't normalized'. I get the following suggestion: > > Design B) > primaryKey | appInstanceId | reportId |paramNum |paramValue > ---with a table second table-- > paramNum | paramDescription Which would be: create table "Application Instance" ( appInstanceId integer primary key not null , name text not null collate nocase , location text not null collate nocase , version text not null collate nocase ) ; create table "Report" ( reportId integer primary key not null , datetime real not null , size integer ) ; create table "Param" ( paramNum integer primary key not null , paramDescription text ) ; create table "Runtime" ( primaryKey integer primary key not null , appInstanceId integer not null references "Application Instance" (appInstanceId) on delete cascade on update cascade , reportId integer not null references "Report" (reportId) on delete cascade on update cascade , paramNum integer not null references "Param" (paramNum) on delete restrict on update cascade , paramValue not null ) ; > The reason for Design B: > New parameters are easily added (no schema change) > It is "Better Design" (throwing around the word normalization a lot) Agreed. > My reasons for Design A: > 1:1 record to report (not 1:25 records) > meta data overhead per row is 1/9 (opposed to 4/1) Those aren't problems. You can use views and other mechanisms to view the data in whatever layout you like (so you don't have to manually cross reference). But the underlying data should be stored efficiently and, yes, normalized. > I'm not a database guru, so I can't fight back...but I feel I am hearing a > line straight from a textbook. Critics of my design simply tells me it is > wrong and keep repeating the mantra "It is not normalized". Yes, that is frustrating. I've been there, years ago, even on this list, had "normalized" thrown at me without explanation. The best thing to do is research what normalized means. I am a convert now ;-) The "Definitive Guide to SQLite" book is a good place to start, in the chapter on "Normalization" (though read the previous pages first). In part it states: >> Normalization concerns itself with the organization of attributes within >> relations so as to minimize duplication of data. Data duplication, as you >> will see, has more deleterious effects than just taking up unnecessary >> space. It increases the opportunity for database inconsistencies to arise. >> Normalization is about designing your database to be more resistant to the >> ill effects of thoughtless users and buggy programs. >> As stated, the chief aim of normalization is to eradicate duplication. >> Relations that have duplication removed are said to be normalized. However, >> there are degrees of normalization. These degrees are called normal forms. >> First normal form simply states that all attributes in a relation use >> domains that are made up of atomic values..., meaning simply “that which >> cannot be broken down further.” >> To understand second and third normal form, you have to first understand >> functional dependencies. The simple
Re: [sqlite] seeking database design advice
On Mar 2, 2012, at 1:28 AM, Rich Rattanni wrote: > I realize design A may not be "best", but I would prefer an friendly > answer and not a canned response like I get from co-workers. [warning: canned answer ahead] FWIW, there is some good wisdom in going down the "text book" way, aka normalization :) http://en.wikipedia.org/wiki/Database_normalization On the other hand, if your main goal is reporting, then perhaps a different form of modelization might be more effective: http://en.wikipedia.org/wiki/Dimensional_modeling Either way, applying a bit of method (the "text book" part) has its benefits. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] seeking database design advice
All: I wanted to solicit some opinions about how best to layout a database table for a project I am working on. I have a distributed application that reports run time information back to a central machine. Included in the report are the parameters used during runtime... approx 25 or so. Parameters are site specific and may be changed locally by whomever runs the application. I have a table that records an application instance (columns such as location, version etc), and a table that records report instances (time, date, size). I wanted to be able to query at a glance the most recent parameter set in use, so I came up with the following table design... Design A) primaryKey | appInstanceId | reportId | param1 | param2 | ... | param25 I spoke to a few people at work, and they all say the same thing... "this isn't normalized'. I get the following suggestion: Design B) primaryKey | appInstanceId | reportId |paramNum |paramValue ---with a table second table-- paramNum | paramDescription The reason for Design B: New parameters are easily added (no schema change) It is "Better Design" (throwing around the word normalization a lot) My reasons for Design A: 1:1 record to report (not 1:25 records) meta data overhead per row is 1/9 (opposed to 4/1) I'm not a database guru, so I can't fight back...but I feel I am hearing a line straight from a textbook. Critics of my design simply tells me it is wrong and keep repeating the mantra "It is not normalized". The way I see it the column itself describes the data, which means less data is stored... efficient and simple. I do see the appeal of Design A by effectively 'future-proofing' the database table. My application performing the inserts should not have to change, I just always insert the parameters I find in each report. I should add that I do expect future parameters to be added (maybe 5,10 more at most, but rather infrequently). Old and new versions will co-exist, but eventually all application instances should be updated. With Design A I would add new columns and set existing records to NULL. I would expect 100,000 reports a month or so. When I select from parameters I generally will show all parameters, so my queries become "select * from tbl where appid=x and reportId=y" with 1 result set necessary (not iterating over 25 results). I realize design A may not be "best", but I would prefer an friendly answer and not a canned response like I get from co-workers. Thanks in advance, Rich ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users