Re: [sqlite] Table within a table??
Darren Duncan wrote: > > Being that arrays *are* relations, you can use all the relational > operators on them. > Just to be totally clear - an array is not a relation. An array has fixed order of each dimension (eg columns and rows), and you address it by position. A relation is unordered (although you can use a query language like SQL to produce ordered tables), and you address it by data values. The rationale was that a user of the information wouldn't have to dig around to find out that Field 1 is Q1 sales, Field 2 is Q2 sales, Field 5 is Actual Sales to Date and Field 6 is Last Year Total Sales etc. Q2 Sales would have to have a name, although there's nothing to stop someone calling it SXVAT_Q2_PreCV2. It's better than nothing. Nothing's perfect and it seems no-one can say what a domain should contain but that's no excuse to throw the baby out with the bath water. Stick as close to relational thinking as you can, and your design will be that much more effective. -- View this message in context: http://old.nabble.com/Table-within-a-table---tp26125451p26199506.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
> Jay A. Kreibich wrote: >> On Tue, Nov 03, 2009 at 02:51:23AM -0800, CityDev scratched on the wall: >>> That just seems so contrary to the original idea of the relational >>> model that you shouldn't have any data whose meaning is not defined >>> by data (in the case of an array you need to understand the >>> significance of relative position - remember relations have no row >>> or column order to stop you playing that game). > > So what is a character string then? An *ordered* sequence of characters. And > yet this coexists just fine with the relational model. An "array" is just a > generalization of this concept. Except that the characters in a string lose all meaning when used individually. Arrays on the other hand, while ordered, generally hold sequences of data such that each element has substantial meaning individually, and may need to be queried against. In fact, even storing strings tends to cause problems, because often people want to query only a slice of a string and this often requires a full table scan. If you absolutely must store arrays, you can do this by storing them in blobs. If you need to query against the array elements, use a custom function (though, if you need to query against the elements in the array, you should REALLY be using a relational table.) As far as order goes, it is reasonably easy to order the rows using a field for that purpose. John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
Jay A. Kreibich wrote: > On Tue, Nov 03, 2009 at 02:51:23AM -0800, CityDev scratched on the wall: >> That just seems so contrary to the original idea of the relational >> model that you shouldn't have any data whose meaning is not defined >> by data (in the case of an array you need to understand the >> significance of relative position - remember relations have no row >> or column order to stop you playing that game). So what is a character string then? An *ordered* sequence of characters. And yet this coexists just fine with the relational model. An "array" is just a generalization of this concept. > Yes, EXACTLY. > > And so it is for nearly any other compound datatype. > > Arrays have implied ordering. You can't JOIN against an array > without extracting meta-data like that. Even if you do that, you > have to pair that data together and carry it through any manipulation. > If the data gets separated or lost, both sets of data becomes > worthless. Not Relational. Same is true of any other container > with an inherent ordering or listing (which is most of them). I see no problem supporting arrays in the relational model because you can define an array in terms of a relation. For example, an array can be any relation that has an explicit attribute, say call it "index", which defines the conceptual order of the elements, which are defined using the remaining relation attributes. Being that arrays *are* relations, you can use all the relational operators on them. And so an array-valued attribute *is* an RVA. Of course, you'll want to be careful in how you use the array-relations so that you keep the index-defining attribute where you need to. But this is the same as with any other important attributes, such as person name or person id or SIN whatever you need in the context. (Similarly, you can implement a bag/multiset over a relation by just having an explicit attribute to count instances, say call it "count".) Using the above methodology, it is perfectly valid to have an "order by" in the relational model; the result of an order-by operation is a relation that is like its main input relation but with an extra indexing attribute added. Note that the RANK operator one can find in math or SQL is essentially the same thing. And "limit" is just a slice/restriction/semijoin on a relation with an ordering attribute, that filters on the value of the indexing attribute being in a certain range. Such is one way that my Muldis D language supports the actual relational model while at the same time supporting all the useful things programmers want to do, and that SQL can do. >> So I still go back to point i) - you don't need to do this. > > I hope not, as my main point was that I feel they aren't allowed > under the Relational Model. I agree with your thoughts, I just think > there is a more rigid way of showing it to be true. > > But that doesn't mean people don't want them. Indeed. There is a lot of data where it makes sense to be ordered. But then, doing it properly in the relational model is encoding that desire as *data*, such as my above example does. > The point you're getting at is still valid, however. One aspect of > the Model is that a system can *physically* store the data in all > kinds of interesting ways because all the transforms are 100% > reversible. You don't really need to store the table, as defined, in > any physical way as long as you can compute it from the data you've > stored. RVAs, as a data modeling idea, fit right into that because > the only real difference between an RVA and a non-RVA is how you > write the data out on the display. Absolutely, and that's one of the best features of the relational model. That it gives users a lot of flexibility while implementations have a lot of flexibility to optimize behind the scenes, while they can be confident this optimization won't change the results users get. Not so much with SQL. -- Darren Duncan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
On Tue, Nov 03, 2009 at 02:51:23AM -0800, CityDev scratched on the wall: > > Whilst it's true that SQL isn't essential for a relational database More to the point, any database that supports SQL does not, and cannot, support the data typing and data manipulation rules set out by the Relational Model. The Relational Model defines a set of structure and manipulation rules. It is a mathematical system. It does not define any kind of query language or symbolic expression system-- that is outside of the scope of the Model. In fact, almost by definition a query language has to be outside of the scope of the Model in order to be practical (for example, some kind of ORDER BY option in a query). The SQL standard defines not only the language syntax, but also defines it's own data typing and data manipulation rules. It defines it's own mathematical system-- kind of. The data manipulation environment defined by SQL is not really rigid or self-consistent enough to call a "mathematical system." The SQL system is strongly based off the Relational Model, but it is fundamentally incompatible in many small but important ways. So even if we ignore "SQL the language", what we have left in "SQL the data manipulation environment" is not the Relational Model. Siblings, perhaps, but not the same. So when you said, "SQL is a first-order language so it can't easily handle substructure," as a point of argument, my feeling is that A) that might be true, B) but it doesn't matter, C) and it really doesn't matter. A) Sure, why not. SQL can't easily handle a lot of the things it tries to do. But since the computational environment is not defined by the language, it B) doesn't really matter. Just because my fancy calculator doesn't have a way to express and generate 4D graphs doesn't mean 4D geometry doesn't exist as a full and mature system, and isn't useful for somethings. It only means I need a better calculator (with a *way* better display...) or I need to do it "by hand" using the rules of the environment directly. But, C) it really doesn't matter, because the context of the discussion was the Relational Model, and you can't really make Relational Model arguments when thinking in SQL. Like one of my computational professors used to say, "When you argue in English, you're wrong." It's kind of the same thing. You can't really make Relational Model arguments in SQL. > I'm interested in your remark that relational databases now cope with > 'arrays'. Didn't say that. I said several SQL databases do. Although looking through some documentation, it looks like that number is a bit less than I first thought. > That just seems so contrary to the original idea of the relational > model that you shouldn't have any data whose meaning is not defined > by data (in the case of an array you need to understand the > significance of relative position - remember relations have no row > or column order to stop you playing that game). Yes, EXACTLY. And so it is for nearly any other compound datatype. Arrays have implied ordering. You can't JOIN against an array without extracting meta-data like that. Even if you do that, you have to pair that data together and carry it through any manipulation. If the data gets separated or lost, both sets of data becomes worthless. Not Relational. Same is true of any other container with an inherent ordering or listing (which is most of them). What about sets? The Relational Model has a lot of set theory in it, but it isn't "pure" set theory. For example, many "true" set operations will result in sets of sets. Those operations in the Relational Model (applied to the sub-sections of relations) will always be "flattened" to just sets of elements. So sets don't work within the Relational Model unless you modify their behavior. If you do that, you no longer have "real" sets. You basically have... a relation. Or something that acts like a relation in pretty much every way that counts. So now you have a RVA, and we know those don't count because they can be seamlessly transformed into a non-RVA representation without losing data or meaning (or generating meta-data, which is what I meant by "meaning"). And, BTW, SQL does consider table/result-set columns to be ordered. One more fundamental difference that causes all kinds of headaches for query optimizers, because of issues like technically A NATURAL JOIN B is NOT equal to B NATURAL JOIN A. > So I still go back to point i) - you don't need to do this. I hope not, as my main point was that I feel they aren't allowed under the Relational Model. I agree with your thoughts, I just think there is a more rigid way of showing it to be true. But that doesn't mean people don't want them. So why are they there? Most SQL people aren't all that aware of the theory behind database
Re: [sqlite] Table within a table??
On Tue, Nov 3, 2009 at 4:51 AM, CityDevwrote: > I'm interested in your remark that relational databases now cope with > 'arrays'. Personally I've never seen that in DB2, Jet or SQLite. That just > seems so contrary to the original idea of the relational model that you > shouldn't have any data whose meaning is not defined by data (in the case of > an array you need to understand the significance of relative position - > remember relations have no row or column order to stop you playing that > game). http://www.postgresql.org/docs/8.2/interactive/arrays.html -- Puneet Kishor http://www.punkish.org Carbon Model http://carbonmodel.org Charter Member, Open Source Geospatial Foundation http://www.osgeo.org Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor Nelson Institute, UW-Madison http://www.nelson.wisc.edu --- Assertions are politics; backing up assertions with evidence is science === ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
Thank you for this response, Jay, and your other long one. They seem very well informed and helpful to the general practitioner. And "Database in Depth" is indeed a great book, and should be read by anyone wanting to understand databases; despite the name, it is actually quite approachable. Regarding "atomic" et al, my own impression of what Date says is slightly different. My perception is not that he tries to define "atomic" and then gives up, but rather that he emphasizes from the start that practically speaking there is no such thing. That is ... One main point in support of relation-valued attributes (in terms of which all other kinds of collections can be represented, such as arrays) is that rejecting them on the basis of only wanting atomic or scalar values doesn't make sense, because any value can be represented as a collection of other values. For example, a string can be a collection of characters, or a number can be a string of digits, etc. And so, forbidding collection values is making a very arbitrary limitation. This means that the simplest or more pure solution is to allow any value at all as an attribute value. I won't repeat what you said about relations without RVAs being losslessly convertible to relations with them, and back again likewise (it is like taking the math expression "5x + 10y" and changing it to "5 * (x + 2y)", then changing it back). But this is what I was referring to in my original message in the thread where I said using the 2 tables was logically equivalent to using one and making work_history an RVA. In regards to 1NF, as you say, relations are already in 1NF by definition, and only SQL tables are capable of not being in 1NF, since in the general case they are not relations. As far as I know the main distinction here is that SQL allows duplicate rows in a table, and so a SQL table is like a bag, while a relation by definition is a set and never has duplicate tuples (rows). In that respect, having a 1NF table in SQL is as simple as having a primary key or unique key on that table. But even then, some SQL rowsets may have duplicate rows depending on how a SELECT is built, while relations never would. Now, even if SQL isn't the same as the relational model, it has a lot in common, and to a large degree if one uses SQL the right way its behavior can be quite close to the relational model, and more correct and trouble-free. I highly recommend Chris Date's newer book, "SQL and Relational Theory: How to Write Accurate SQL Code" (2009), http://oreilly.com/catalog/9780596523084/ , which talks about how to use the SQL DBMSs you already have in the most effective way possible, in regards to getting the behavior you want error free. And yes, that is very applicable to SQLite specifically. -- Darren Duncan Jay A. Kreibich wrote: > On Sun, Nov 01, 2009 at 01:23:52PM -0800, CityDev scratched on the wall: >> Darren Duncan wrote: >>> Or at least it is in the version of the relational model >>> that allows non-scalar attribute values, but that is the one that Chris >>> Date et al, as well as myself ascribe to. >> I didn't read this through but I recall Chris Date defining a relational >> database as formed from relations normalised to 1NF. First Normal Form >> basically means no substructure ie each column is a single value from a >> scalar domain. > >> To say it would be nice to have structured data is ignoring >> two things: i) we've got this for just fine without it, and ii) SQL is a >> first-order language so it can't easily handle substructure. > > Let me address these two points first. As for "i", then please > explain to my why nearly every SQL database supports array values? > I'm not sure it is in the standard, but it is so common it might > as well be (for those wondering, no, SQLite does not support arrays). > The answer for "ii" is easy: SQL is not the Relational Model, so the > only point you might be making is that SQL makes for a poor relational > language. > > Now back to your first paragraph. This is the more interesting one. > > Executive Summary: Multi-valued attributes (column types) > are not allowed in the Relational Model. Relational-Valued > Attributes (tables in tables) are OK, however, because they > don't count. > > Now a few pages on why > > The term used by Date (and Codd) is "atomic", not scalar. Of course, > this is generally meaningless, as it is very hard to define what > makes a value "atomic." Are text values atomic? The SQL standard > includes a number of functions to extract, match, and manipulate > sub-strings. What about integers? SQLite includes bit operations > that can be applied to integer values. Does that make them non-atomic? > > In "Database In Depth", Date eventually just gives up trying to > define "atomic" in a formal sense, says it cannot and should not be > done. In his writings, Date points out
Re: [sqlite] Table within a table??
On Sun, Nov 01, 2009 at 01:23:52PM -0800, CityDev scratched on the wall: > > Darren Duncan wrote: > > > > Or at least it is in the version of the relational model > > that allows non-scalar attribute values, but that is the one that Chris > > Date et al, as well as myself ascribe to. > > I didn't read this through but I recall Chris Date defining a relational > database as formed from relations normalised to 1NF. First Normal Form > basically means no substructure ie each column is a single value from a > scalar domain. > To say it would be nice to have structured data is ignoring > two things: i) we've got this for just fine without it, and ii) SQL is a > first-order language so it can't easily handle substructure. Let me address these two points first. As for "i", then please explain to my why nearly every SQL database supports array values? I'm not sure it is in the standard, but it is so common it might as well be (for those wondering, no, SQLite does not support arrays). The answer for "ii" is easy: SQL is not the Relational Model, so the only point you might be making is that SQL makes for a poor relational language. Now back to your first paragraph. This is the more interesting one. Executive Summary: Multi-valued attributes (column types) are not allowed in the Relational Model. Relational-Valued Attributes (tables in tables) are OK, however, because they don't count. Now a few pages on why The term used by Date (and Codd) is "atomic", not scalar. Of course, this is generally meaningless, as it is very hard to define what makes a value "atomic." Are text values atomic? The SQL standard includes a number of functions to extract, match, and manipulate sub-strings. What about integers? SQLite includes bit operations that can be applied to integer values. Does that make them non-atomic? In "Database In Depth", Date eventually just gives up trying to define "atomic" in a formal sense, says it cannot and should not be done. In his writings, Date points out that even atoms are not "atomic," and can be broken down into quarks-- but that the idea of atomic values is still a good and powerful one, even if it is inexact. Given all that, it might come as a surprise that Codd and Date *do* support the idea that RVA (Relation-Valued Attribute) fit into the Relational Model. A few points on that. First, the whole concept of 1NF is from the SQL world. To put a table into 1NF is basically to make it mimic the requirements of a relation. Under the Relational Model, it is basically impossible to have a valid Relation that is not in 1NF. If it isn't in 1NF, it isn't a relation. So why allow RVAs? Well, to start with, under the Relational Model relations are atomic. Yes, really. Under the formal Model you're not allowed to change or alter any sub-value of a relation. A relational value is what it is, and the whole state of the relation is what represents its value. By taking the product of the column type domains, you can compute a known, finite domain for every possible value for a given relation (relations don't allow duplicate rows, so the number of rows is bound). That's typically a *very* large domain, but since it is known and finite, it is a valid type domain. As an extension of that idea, when you alter a column or row under the Model you're not changing a sub-part of the relation, you're replacing the whole relational value with a new value. INSERT doesn't add a new row to an existing relation, it computes a new relation value that is the UNION of the old relation and the new row. Similar manipulations can apply the UPDATE and DELETE, but in all cases you're not altering a relation, you're computing a new one. So, in terms of the model that defines both what a relation is and what it can contain, relations are atomic-- and from that RVAs are naturally allowed. At this point, if you're raising an eyebrow and thinking that sounds like a pretty questionable slight of hand in the formal definition of atomic values, I'd be quick to agree. But here's another point that is, perhaps, more important: Any relation with a RVA can be decomposed into a non-RVA relation. And, just as importantly, can be re-composed back into the original RVA. That is, you can "round trip" in and out of a RVA representationwithout any loss of data or meaning. One can make a strong argument that RVAs are just a different way to organize the data "on paper", but in the end the addition (or removal) of RVA doesn't alter what the Model is capable of representing or doing. This works the same way that any static multi-dimensional array type can be reduced to a one-dimensional array type without loss of functionality. And it is *this* point that separates RVAs from storing arrays, lists, or other "multi-value" containers as a row/column
Re: [sqlite] Table within a table??
Darren Duncan wrote: > > > Or at least it is in the version of the relational model > that allows non-scalar attribute values, but that is the one that Chris > Date et > al, as well as myself ascribe to. > > I didn't read this through but I recall Chris Date defining a relational database as formed from relations normalised to 1NF. First Normal Form basically means no substructure ie each column is a single value from a scalar domain. To say it would be nice to have structured data is ignoring two things: i) we've got this for just fine without it, and ii) SQL is a first-order language so it can't easily handle substructure. -- View this message in context: http://old.nabble.com/Table-within-a-table---tp26125451p26154639.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
On Nov 1, 2009, at 2:32 AM, Jay A. Kreibich wrote: > Anyways... I've gone on long enough. Good luck with your design. > Think a bit, ask good questions, and hopefully we can all see > a different point of view and learn to see something new. Excellent post, thank you :) Along the same lines, "Double-thinking in SQL" is worth a read as well: http://explainextended.com/2009/07/12/double-thinking-in-sql/ -- PA. http://alt.textdrive.com/nanoki/ ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
+1 Jay, for explaining. another +1 for explaining so well. yet another +1 for explaining in such detailed. Heck, if you had written such a long post full of nonsense, you should have gotten +1. Since you wrote such a long post that is so useful and helpful to probably everyone, your post should really be enshrined in the sqlite wiki. On Sat, Oct 31, 2009 at 8:32 PM, Jay A. Kreibichwrote: > > On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall: >> O.K. I think I am starting to get the idea. It is just so foreign for me >> to organize things this way. A master work history table for all cases >> almost seems confusing. It will just take a bit of adjustment for me >> to "trust" the database way of doing things. Text files organized in >> the way I described has always made it easy for me to figure things >> out when there was a problem. > > This is not uncommon. If I had to guess, I'd say you're an experienced > applications developer, but are somewhat new to databases. You're > applying your instincts and experience in designing and laying out > runtime-data structures and classes to your database design, but find > yourself stumbling over some of the minor conceptual differences in > this new environment. > > The good news is that this is normal, and nearly everyone makes these > mistakes. The great news is that I've found much of that experience > can be utilized and successfully applied to database design as soon > as you wrap your head around a few minor differences. > > The biggest stumbling block-- and the one you've hit head-on here-- > is to think of tables as *instances* of compound data structures. They > look a lot like arrays or dynamic lists, so this is an easy mistake. > > In your case, you need a bunch of records associated with some other > table row, so the instinct is to create a brand new table to fit that > need. You'll end up with a bunch of tables with the exact same > type-signature (column/type/name pattern), but that's how instances > of data structures or classes work-- this is exactly what you'd do > if that table was some fancy C++ class that offered a dynamic list > of elements. It also provides for a clear and direct route to get > from a main data record to a collection of related sub-members. > > Unfortunately, in the database world, that's the wrong concept and, > as others have pointed out, this will lead to no end of problems. > > Here is the first big rule, and it alone will get you pretty far: > > Tables are data structures and classes. > > If you would define a data structure or class to fit some specific > and well defined purpose, that's a table. The .sql file that > defines your tables should map to the .h file in your head. > > Don't think of tables themselves as multi-element data structures, > think of them as the data structures definitions. It just happens that > in the database environment, every instance of a particular data > structure (i.e. a row) happens to be managed by a global instance pool. > > In extension of that, the database world has no "contains a." All > your compound structures are built from "references a." And the > references that bind everything together are not pointers, but ID > values. > > The other big stumbling block is that most of these references are > "backwards". If you were building this as an application, you'd have > your main records, and each one of those would have a pointer off to > some dynamic list or array that held the log records. In the database > world, the main record simply exists with some type of unique > identifier. Rather than the main record pointing to the associated log > records, the log records point back and the main record they're > associated with. > > This tends to make application developers uncomfortable. If you had > a main record and wanted a list of all the log records associated > with it, you'd want to be able to de-reference some pointer or other > association directly to some container item, like an array or list of > log records. This is generally what leads to the desire to build > multiple tables of the same type, because tables look a lot like > instances of a container item. You're thinking of those tables as > *instances* of multi-element data structures, rather than a global > collection of all sub-instances of that type. > > That's pretty natural. After all, it would be an odd way to build an > application. While many environments have global lists of objects > (e.g. static class variables that hold a list of every instance of > that particular class), you rarely use those lists for building > general associations. In specific, if you were manipulating record X > and wanted a list of all the associated log records for X, you would > never consider putting a "I belong to data X" field into the log > record. This would require scanning the global
Re: [sqlite] Table within a table??
On Sun, 1 Nov 2009 02:27:16 -0400, mark mwrote: > P.S. your developer vs. database perspective should be > a sticky or FAQ for other newbies on this mailing list. If Jay allows (I'm sure he does), feel free to add an entry to: http://www.sqlite.org/cvstrac/wiki The chapter "Hints For Using SQLite More Effectively" seems appropriate. -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
Well, you pretty much guessed right on just about everything going through my head. I am brand new to database programming and have been trying to learn just enough to accomplish what I want to do within my own app. The app I'm working on has evolved over the years into something that could now benefit from something more sophisticated than text file data storage. Your explanations were EXTREMELY helpful. If anything it gives me the confidence that I'm going down the right road. If not for the great help from everyone in this thread I might have abandoned this effort altogether. My developer instincts were telling me that I was missing something important or asking the wrong questions. Thanks again for all the help. P.S. your developer vs. database perspective should be a sticky or FAQ for other newbies on this mailing list. Mark On Sat, Oct 31, 2009 at 9:32 PM, Jay A. Kreibichwrote: > > On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall: > > O.K. I think I am starting to get the idea. It is just so foreign for > me > > to organize things this way. A master work history table for all cases > > almost seems confusing. It will just take a bit of adjustment for me > > to "trust" the database way of doing things. Text files organized in > > the way I described has always made it easy for me to figure things > > out when there was a problem. > > This is not uncommon. If I had to guess, I'd say you're an experienced > applications developer, but are somewhat new to databases. You're > applying your instincts and experience in designing and laying out > runtime-data structures and classes to your database design, but find > yourself stumbling over some of the minor conceptual differences in > this new environment. > > The good news is that this is normal, and nearly everyone makes these > mistakes. The great news is that I've found much of that experience > can be utilized and successfully applied to database design as soon > as you wrap your head around a few minor differences. > > The biggest stumbling block-- and the one you've hit head-on here-- > is to think of tables as *instances* of compound data structures. They > look a lot like arrays or dynamic lists, so this is an easy mistake. > > In your case, you need a bunch of records associated with some other > table row, so the instinct is to create a brand new table to fit that > need. You'll end up with a bunch of tables with the exact same > type-signature (column/type/name pattern), but that's how instances > of data structures or classes work-- this is exactly what you'd do > if that table was some fancy C++ class that offered a dynamic list > of elements. It also provides for a clear and direct route to get > from a main data record to a collection of related sub-members. > > Unfortunately, in the database world, that's the wrong concept and, > as others have pointed out, this will lead to no end of problems. > > Here is the first big rule, and it alone will get you pretty far: > >Tables are data structures and classes. > > If you would define a data structure or class to fit some specific > and well defined purpose, that's a table. The .sql file that > defines your tables should map to the .h file in your head. > > Don't think of tables themselves as multi-element data structures, > think of them as the data structures definitions. It just happens that > in the database environment, every instance of a particular data > structure (i.e. a row) happens to be managed by a global instance pool. > > In extension of that, the database world has no "contains a." All > your compound structures are built from "references a." And the > references that bind everything together are not pointers, but ID > values. > > The other big stumbling block is that most of these references are > "backwards". If you were building this as an application, you'd have > your main records, and each one of those would have a pointer off to > some dynamic list or array that held the log records. In the database > world, the main record simply exists with some type of unique > identifier. Rather than the main record pointing to the associated log > records, the log records point back and the main record they're > associated with. > > This tends to make application developers uncomfortable. If you had > a main record and wanted a list of all the log records associated > with it, you'd want to be able to de-reference some pointer or other > association directly to some container item, like an array or list of > log records. This is generally what leads to the desire to build > multiple tables of the same type, because tables look a lot like > instances of a container item. You're thinking of those tables as > *instances* of multi-element data structures, rather than a global > collection of all sub-instances of that type. > > That's pretty
Re: [sqlite] Table within a table??
On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall: > O.K. I think I am starting to get the idea. It is just so foreign for me > to organize things this way. A master work history table for all cases > almost seems confusing. It will just take a bit of adjustment for me > to "trust" the database way of doing things. Text files organized in > the way I described has always made it easy for me to figure things > out when there was a problem. This is not uncommon. If I had to guess, I'd say you're an experienced applications developer, but are somewhat new to databases. You're applying your instincts and experience in designing and laying out runtime-data structures and classes to your database design, but find yourself stumbling over some of the minor conceptual differences in this new environment. The good news is that this is normal, and nearly everyone makes these mistakes. The great news is that I've found much of that experience can be utilized and successfully applied to database design as soon as you wrap your head around a few minor differences. The biggest stumbling block-- and the one you've hit head-on here-- is to think of tables as *instances* of compound data structures. They look a lot like arrays or dynamic lists, so this is an easy mistake. In your case, you need a bunch of records associated with some other table row, so the instinct is to create a brand new table to fit that need. You'll end up with a bunch of tables with the exact same type-signature (column/type/name pattern), but that's how instances of data structures or classes work-- this is exactly what you'd do if that table was some fancy C++ class that offered a dynamic list of elements. It also provides for a clear and direct route to get from a main data record to a collection of related sub-members. Unfortunately, in the database world, that's the wrong concept and, as others have pointed out, this will lead to no end of problems. Here is the first big rule, and it alone will get you pretty far: Tables are data structures and classes. If you would define a data structure or class to fit some specific and well defined purpose, that's a table. The .sql file that defines your tables should map to the .h file in your head. Don't think of tables themselves as multi-element data structures, think of them as the data structures definitions. It just happens that in the database environment, every instance of a particular data structure (i.e. a row) happens to be managed by a global instance pool. In extension of that, the database world has no "contains a." All your compound structures are built from "references a." And the references that bind everything together are not pointers, but ID values. The other big stumbling block is that most of these references are "backwards". If you were building this as an application, you'd have your main records, and each one of those would have a pointer off to some dynamic list or array that held the log records. In the database world, the main record simply exists with some type of unique identifier. Rather than the main record pointing to the associated log records, the log records point back and the main record they're associated with. This tends to make application developers uncomfortable. If you had a main record and wanted a list of all the log records associated with it, you'd want to be able to de-reference some pointer or other association directly to some container item, like an array or list of log records. This is generally what leads to the desire to build multiple tables of the same type, because tables look a lot like instances of a container item. You're thinking of those tables as *instances* of multi-element data structures, rather than a global collection of all sub-instances of that type. That's pretty natural. After all, it would be an odd way to build an application. While many environments have global lists of objects (e.g. static class variables that hold a list of every instance of that particular class), you rarely use those lists for building general associations. In specific, if you were manipulating record X and wanted a list of all the associated log records for X, you would never consider putting a "I belong to data X" field into the log record. This would require scanning the global list of log records to get from a main record to all the associated log records. Not only is that somewhat indirect, it sounds very inefficient. But this is exactly what you do in the database world. The main difference being that this is exactly the kind of thing that databases are very very good at-- both in terms of optimized searches for specific records in a large collection (if properly indexed) as well as powerful set manipulations that allow very complex
Re: [sqlite] Table within a table??
O.K. I think I am starting to get the idea. It is just so foreign for me to organize things this way. A master work history table for all cases almost seems confusing. It will just take a bit of adjustment for me to "trust" the database way of doing things. Text files organized in the way I described has always made it easy for me to figure things out when there was a problem. I will give this a try. Thanks very much for all the help. On Fri, Oct 30, 2009 at 4:04 PM, Darren Duncanwrote: > mark m wrote: > > Thanks very much!! It also occurred to me that I could have a Table > named > > "case1" and another > > named "case1workhist". The RDBMS wouldn't know they were related but my > > application could be > > set up to know this. > > > > Here is more detail on my current data organization: > > > > Open Cases > >Case 1... > >Case 2... > > field 1 > > field 2 > > work history > > item 1 > > item 2 > > worker ID > > duration > > type > > rate > > item 3 > > ... > > ... > > item n > >Case 3 > >... > >... > >Case n > > > > In my app, a certain case is chosen to be displayed. All of the above > > information is displayed in one form > > or another within my app. The user can change any given piece of info or > > add new information. So, I guess > > I could just look for the tables "Casen" and "casenworkhist" to display > my > > info. I do however need to go through > > and calculate all the hours for all open cases and other calculations > like > > that. I want to be sure I'm setting things > > up in a way that will allow me to do this with minimal overhead. > > > > In your method I would only have 1 workhist table?? whereas in my method > I > > would have n workhist tables. Is it > > better to have only 1 and use a foriegn key like you describe to link > work > > history records with a given case?? Is the > > rule generally to minimize the number of tables?? > > A rule for relational database best practices is to minimize the number of > tables that are mutually homogeneous in meaning and structure, and to allow > or > exploit multiple tables that are mutually heterogeneous. If you have a > conceptual reason for having multiple same-looking tables, then you encode > that > as an extra column in the 1 table. > > So for example, the work history details for *all* of your cases would go > in *1* > work_history table, not a separate table for each case, and you would have > a > field in work_history called case_number to identify which records of that > table > belong to each case. > > Given the hierarchy you mention, a table layout like this might work: > > CREATE TABLE cases ( > case_number INTEGER PRIMARY KEY, > , > > ) > > CREATE TABLE workers ( > worker_id INTEGER PRIMARY KEY, > ... > ) > > CREATE TABLE work_histories ( > work_history_item_number INTEGER PRIMARY KEY, > case_number INTEGER, > worker_id INTEGER, > duration, > type, > rate, > UNIQUE KEY (case_number, worker_id) # or make this pk instead > FOREIGN KEY (case_number) REFERENCES cases (case_number), > FOREIGN KEY (worker_id) REFERENCES workers (worker_id) > ) > > By the way, I name my tables in plural to describe what the table as a > whole > represents, (much like how one might name an array variable in a program, > which > is what a table is analogous to save for not being ordered), which is for > example a collection of cases. But some people will tell you tables should > be > named after what an individual record in it represents; I disagree with > them (it > is like naming an array after what one of its elements is) but it is > commonly > practiced, and its a style issue in the end, the DBMS doesn't care. > > -- Darren Duncan > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
mark m wrote: > Thanks very much!! It also occurred to me that I could have a Table named > "case1" and another > named "case1workhist". The RDBMS wouldn't know they were related but my > application could be > set up to know this. > > Here is more detail on my current data organization: > > Open Cases >Case 1... >Case 2... > field 1 > field 2 > work history > item 1 > item 2 > worker ID > duration > type > rate > item 3 > ... > ... > item n >Case 3 >... >... >Case n > > In my app, a certain case is chosen to be displayed. All of the above > information is displayed in one form > or another within my app. The user can change any given piece of info or > add new information. So, I guess > I could just look for the tables "Casen" and "casenworkhist" to display my > info. I do however need to go through > and calculate all the hours for all open cases and other calculations like > that. I want to be sure I'm setting things > up in a way that will allow me to do this with minimal overhead. > > In your method I would only have 1 workhist table?? whereas in my method I > would have n workhist tables. Is it > better to have only 1 and use a foriegn key like you describe to link work > history records with a given case?? Is the > rule generally to minimize the number of tables?? A rule for relational database best practices is to minimize the number of tables that are mutually homogeneous in meaning and structure, and to allow or exploit multiple tables that are mutually heterogeneous. If you have a conceptual reason for having multiple same-looking tables, then you encode that as an extra column in the 1 table. So for example, the work history details for *all* of your cases would go in *1* work_history table, not a separate table for each case, and you would have a field in work_history called case_number to identify which records of that table belong to each case. Given the hierarchy you mention, a table layout like this might work: CREATE TABLE cases ( case_number INTEGER PRIMARY KEY, , ) CREATE TABLE workers ( worker_id INTEGER PRIMARY KEY, ... ) CREATE TABLE work_histories ( work_history_item_number INTEGER PRIMARY KEY, case_number INTEGER, worker_id INTEGER, duration, type, rate, UNIQUE KEY (case_number, worker_id) # or make this pk instead FOREIGN KEY (case_number) REFERENCES cases (case_number), FOREIGN KEY (worker_id) REFERENCES workers (worker_id) ) By the way, I name my tables in plural to describe what the table as a whole represents, (much like how one might name an array variable in a program, which is what a table is analogous to save for not being ordered), which is for example a collection of cases. But some people will tell you tables should be named after what an individual record in it represents; I disagree with them (it is like naming an array after what one of its elements is) but it is commonly practiced, and its a style issue in the end, the DBMS doesn't care. -- Darren Duncan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
On Fri, Oct 30, 2009 at 2:41 AM, mark mwrote: > Thanks very much!! It also occurred to me that I could have a Table named > "case1" and another > named "case1workhist". The RDBMS wouldn't know they were related but my > application could be > set up to know this. Expand Darren's suggestion with the following -- CREATE TABLE worker ( worker_id INTEGER PRIMARY KEY, b TEXT ); CREATE TABLE work_history ( worker_id INTEGER, c INTEGER, d TEXT, CONSTRAINT FOREIGN KEY (worker_id) REFERENCING worker (worker_id) case_id INTEGER ); CREATE TABLE cases ( case_id INTEGER PRIMARY KEY, e TEXT ); So, above I have added a table for cases where you can store your case history, and then, in your work_history table, added a case_id FK. > > Here is more detail on my current data organization: > > Open Cases > Case 1... > Case 2... > field 1 > field 2 > work history > item 1 > item 2 > worker ID > duration > type > rate > item 3 > ... > ... > item n > Case 3 > ... > ... > Case n > > In my app, a certain case is chosen to be displayed. All of the above > information is displayed in one form > or another within my app. The user can change any given piece of info or > add new information. So, I guess > I could just look for the tables "Casen" and "casenworkhist" to display my > info. I do however need to go through > and calculate all the hours for all open cases and other calculations like > that. I want to be sure I'm setting things > up in a way that will allow me to do this with minimal overhead. > > In your method I would only have 1 workhist table?? whereas in my method I > would have n workhist tables. Is it > better to have only 1 and use a foriegn key like you describe to link work > history records with a given case?? Is the > rule generally to minimize the number of tables?? > > Mark > > On Fri, Oct 30, 2009 at 2:44 AM, Darren Duncan wrote: > >> mark m wrote: >> > I'm very new to database programming so this question is pretty basic >> > >> > I have data that is currently organized as follows: >> > >> > Each case has several fields that contain only one value. There are >> several >> > fields that have a pipe-delimited string >> > that represents a work history. Each work history item has its own >> fields >> > like data, worker ID etc. So, as I convert >> > my text files over to database format, I find myself wanting to have a >> table >> > within a table. So, ideally the case table would >> > have several singular fields and a field named "work history" that would >> > contain a table that would have all of the work history >> > in it. >> > >> > But, I haven't found a way to do this. If this is not possible, what >> would >> > be the best way to organize data such as this in >> > a database program. >> > >> > Thanks for the help. >> > >> > Mark >> >> What you are talking about is perfectly reasonable from a logical >> standpoint, >> and in the relational model the feature would be called "relation-valued >> attributes" or "RVAs". Or at least it is in the version of the relational >> model >> that allows non-scalar attribute values, but that is the one that Chris >> Date et >> al, as well as myself ascribe to. Logically speaking, RVAs are what you >> get as >> the intermediate stage of a "GROUP BY", and are the input for aggregate >> operators like SUM()/COUNT()/MIN()/MAX()/etc. >> >> However, SQLite and many other SQL DBMSs don't support RVAs, so you'll have >> to >> use a logically equivalent arrangement of 2 sibling tables that have a >> parent/child (say, "worker"/"work_history") foreign-key relationship. >> >> That is, instead of this kind of schema (in pseudocode): >> >> var worker : TABLE { >> worker_id : INT, b : TEXT, work_history : TABLE { c : INT, d : TEXT } >> } >> >> ... you have this kind of schema: >> >> var worker : TABLE { worker_id : INT, b : TEXT } >> var work_history : TABLE { worker_id : INT, c : INT, d : TEXT } >> >> ... and work_history.worker_id has a foreign key constraint on >> worker.worker_id >> . The SQL to do the latter is approximately: >> >> CREATE TABLE worker ( >> worker_id INTEGER PRIMARY KEY, >> b TEXT >> ); >> >> CREATE TABLE work_history ( >> worker_id INTEGER, >> c INTEGER, >> d TEXT, >> CONSTRAINT FOREIGN KEY (worker_id) REFERENCING worker (worker_id) >> ); >> >> You would have a record in "worker" for each distinct "worker_id" and that >> record contains all the details that aren't part of the work history. Then >> you >> have a record in "work_history" for each record that would have been in the >> inner table of "worker" had it existed, and you have the added "worker_id" >> field >> in "work_history" for every history row that would refer to the same >> worker. >> Having the same "worker_id"
Re: [sqlite] Table within a table??
Thanks very much!! It also occurred to me that I could have a Table named "case1" and another named "case1workhist". The RDBMS wouldn't know they were related but my application could be set up to know this. Here is more detail on my current data organization: Open Cases Case 1... Case 2... field 1 field 2 work history item 1 item 2 worker ID duration type rate item 3 ... ... item n Case 3 ... ... Case n In my app, a certain case is chosen to be displayed. All of the above information is displayed in one form or another within my app. The user can change any given piece of info or add new information. So, I guess I could just look for the tables "Casen" and "casenworkhist" to display my info. I do however need to go through and calculate all the hours for all open cases and other calculations like that. I want to be sure I'm setting things up in a way that will allow me to do this with minimal overhead. In your method I would only have 1 workhist table?? whereas in my method I would have n workhist tables. Is it better to have only 1 and use a foriegn key like you describe to link work history records with a given case?? Is the rule generally to minimize the number of tables?? Mark On Fri, Oct 30, 2009 at 2:44 AM, Darren Duncanwrote: > mark m wrote: > > I'm very new to database programming so this question is pretty basic > > > > I have data that is currently organized as follows: > > > > Each case has several fields that contain only one value. There are > several > > fields that have a pipe-delimited string > > that represents a work history. Each work history item has its own > fields > > like data, worker ID etc. So, as I convert > > my text files over to database format, I find myself wanting to have a > table > > within a table. So, ideally the case table would > > have several singular fields and a field named "work history" that would > > contain a table that would have all of the work history > > in it. > > > > But, I haven't found a way to do this. If this is not possible, what > would > > be the best way to organize data such as this in > > a database program. > > > > Thanks for the help. > > > > Mark > > What you are talking about is perfectly reasonable from a logical > standpoint, > and in the relational model the feature would be called "relation-valued > attributes" or "RVAs". Or at least it is in the version of the relational > model > that allows non-scalar attribute values, but that is the one that Chris > Date et > al, as well as myself ascribe to. Logically speaking, RVAs are what you > get as > the intermediate stage of a "GROUP BY", and are the input for aggregate > operators like SUM()/COUNT()/MIN()/MAX()/etc. > > However, SQLite and many other SQL DBMSs don't support RVAs, so you'll have > to > use a logically equivalent arrangement of 2 sibling tables that have a > parent/child (say, "worker"/"work_history") foreign-key relationship. > > That is, instead of this kind of schema (in pseudocode): > > var worker : TABLE { > worker_id : INT, b : TEXT, work_history : TABLE { c : INT, d : TEXT } > } > > ... you have this kind of schema: > > var worker : TABLE { worker_id : INT, b : TEXT } > var work_history : TABLE { worker_id : INT, c : INT, d : TEXT } > > ... and work_history.worker_id has a foreign key constraint on > worker.worker_id > . The SQL to do the latter is approximately: > > CREATE TABLE worker ( > worker_id INTEGER PRIMARY KEY, > b TEXT > ); > > CREATE TABLE work_history ( > worker_id INTEGER, > c INTEGER, > d TEXT, > CONSTRAINT FOREIGN KEY (worker_id) REFERENCING worker (worker_id) > ); > > You would have a record in "worker" for each distinct "worker_id" and that > record contains all the details that aren't part of the work history. Then > you > have a record in "work_history" for each record that would have been in the > inner table of "worker" had it existed, and you have the added "worker_id" > field > in "work_history" for every history row that would refer to the same > worker. > Having the same "worker_id" values in both sibling tables tells the RDBMS > which > rows in the 2 tables correspond to each other. > > -- Darren Duncan > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Table within a table??
mark m wrote: > I'm very new to database programming so this question is pretty basic > > I have data that is currently organized as follows: > > Each case has several fields that contain only one value. There are several > fields that have a pipe-delimited string > that represents a work history. Each work history item has its own fields > like data, worker ID etc. So, as I convert > my text files over to database format, I find myself wanting to have a table > within a table. So, ideally the case table would > have several singular fields and a field named "work history" that would > contain a table that would have all of the work history > in it. > > But, I haven't found a way to do this. If this is not possible, what would > be the best way to organize data such as this in > a database program. > > Thanks for the help. > > Mark What you are talking about is perfectly reasonable from a logical standpoint, and in the relational model the feature would be called "relation-valued attributes" or "RVAs". Or at least it is in the version of the relational model that allows non-scalar attribute values, but that is the one that Chris Date et al, as well as myself ascribe to. Logically speaking, RVAs are what you get as the intermediate stage of a "GROUP BY", and are the input for aggregate operators like SUM()/COUNT()/MIN()/MAX()/etc. However, SQLite and many other SQL DBMSs don't support RVAs, so you'll have to use a logically equivalent arrangement of 2 sibling tables that have a parent/child (say, "worker"/"work_history") foreign-key relationship. That is, instead of this kind of schema (in pseudocode): var worker : TABLE { worker_id : INT, b : TEXT, work_history : TABLE { c : INT, d : TEXT } } ... you have this kind of schema: var worker : TABLE { worker_id : INT, b : TEXT } var work_history : TABLE { worker_id : INT, c : INT, d : TEXT } ... and work_history.worker_id has a foreign key constraint on worker.worker_id . The SQL to do the latter is approximately: CREATE TABLE worker ( worker_id INTEGER PRIMARY KEY, b TEXT ); CREATE TABLE work_history ( worker_id INTEGER, c INTEGER, d TEXT, CONSTRAINT FOREIGN KEY (worker_id) REFERENCING worker (worker_id) ); You would have a record in "worker" for each distinct "worker_id" and that record contains all the details that aren't part of the work history. Then you have a record in "work_history" for each record that would have been in the inner table of "worker" had it existed, and you have the added "worker_id" field in "work_history" for every history row that would refer to the same worker. Having the same "worker_id" values in both sibling tables tells the RDBMS which rows in the 2 tables correspond to each other. -- Darren Duncan ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Table within a table??
I'm very new to database programming so this question is pretty basic I have data that is currently organized as follows: Each case has several fields that contain only one value. There are several fields that have a pipe-delimited string that represents a work history. Each work history item has its own fields like data, worker ID etc. So, as I convert my text files over to database format, I find myself wanting to have a table within a table. So, ideally the case table would have several singular fields and a field named "work history" that would contain a table that would have all of the work history in it. But, I haven't found a way to do this. If this is not possible, what would be the best way to organize data such as this in a database program. Thanks for the help. Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users