On Sat, Oct 31, 2009 at 01:13:31AM -0400, mark m scratched on the wall:
> O.K.  I think I am starting to get the idea.  It is just so foreign for me
> to organize things this way. A master work history table for all cases
> almost seems confusing.  It will just take a bit of adjustment for me
> to "trust" the database way of doing things.  Text files organized in
> the way I described has always made it easy for me to figure things
> out when there was a problem.

  This is not uncommon.  If I had to guess, I'd say you're an experienced
  applications developer, but are somewhat new to databases.  You're
  applying your instincts and experience in designing and laying out
  runtime-data structures and classes to your database design, but find
  yourself stumbling over some of the minor conceptual differences in
  this new environment.

  The good news is that this is normal, and nearly everyone makes these
  mistakes.  The great news is that I've found much of that experience
  can be utilized and successfully applied to database design as soon
  as you wrap your head around a few minor differences.

  The biggest stumbling block-- and the one you've hit head-on here--
  is to think of tables as *instances* of compound data structures.  They
  look a lot like arrays or dynamic lists, so this is an easy mistake.

  In your case, you need a bunch of records associated with some other
  table row, so the instinct is to create a brand new table to fit that
  need.  You'll end up with a bunch of tables with the exact same
  type-signature (column/type/name pattern), but that's how instances
  of data structures or classes work-- this is exactly what you'd do
  if that table was some fancy C++ class that offered a dynamic list
  of elements.  It also provides for a clear and direct route to get
  from a main data record to a collection of related sub-members.

  Unfortunately, in the database world, that's the wrong concept and,
  as others have pointed out, this will lead to no end of problems.
 
  Here is the first big rule, and it alone will get you pretty far:

                Tables are data structures and classes.
  
  If you would define a data structure or class to fit some specific
  and well defined purpose, that's a table.  The .sql file that
  defines your tables should map to the .h file in your head.  
  
  Don't think of tables themselves as multi-element data structures,
  think of them as the data structures definitions.  It just happens that
  in the database environment, every instance of a particular data
  structure (i.e. a row) happens to be managed by a global instance pool.
  
  In extension of that, the database world has no "contains a."   All
  your compound structures are built from "references a."  And the
  references that bind everything together are not pointers, but ID
  values.

  The other big stumbling block is that most of these references are
  "backwards".  If you were building this as an application, you'd have
  your main records, and each one of those would have a pointer off to
  some dynamic list or array that held the log records.  In the database
  world, the main record simply exists with some type of unique
  identifier.  Rather than the main record pointing to the associated log
  records, the log records point back and the main record they're
  associated with.

  This tends to make application developers uncomfortable.  If you had
  a main record and wanted a list of all the log records associated
  with it, you'd want to be able to de-reference some pointer or other
  association directly to some container item, like an array or list of
  log records.  This is generally what leads to the desire to build
  multiple tables of the same type, because tables look a lot like
  instances of a container item.  You're thinking of those tables as
  *instances* of multi-element data structures, rather than a global
  collection of all sub-instances of that type.

  That's pretty natural.  After all, it would be an odd way to build an
  application.  While many environments have global lists of objects
  (e.g. static class variables that hold a list of every instance of
  that particular class), you rarely use those lists for building
  general associations.  In specific, if you were manipulating record X
  and wanted a list of all the associated log records for X, you would
  never consider putting a "I belong to data X" field into the log
  record.  This would require scanning the global list of log records
  to get from a main record to all the associated log records.  Not
  only is that somewhat indirect, it sounds very inefficient.

  But this is exactly what you do in the database world.  The main
  difference being that this is exactly the kind of thing that
  databases are very very good at-- both in terms of optimized searches
  for specific records in a large collection (if properly indexed) as
  well as powerful set manipulations that allow very complex
  manipulation of large sets of data as single operations.

  This second point is actually more important than many people first
  realize, and that's fine.  But as you get better with databases
  you'll start to think in doing operations to large sets of data
  items, rather than direct manipulation of discrete items.  That comes
  later, however, and for now it isn't a big limitation to just ignore it.



  So... to make the jump from application development to database design,
  I'd first start by more or less thinking things through the same way.
  If you have years of experience designing data models and class trees,
  there is no good reason to throw all that experience and understanding
  out.  Use it to lay out your block diagrams and datatypes, but with a
  few specific changes:

  1) Tables are your data structure definitions.  If you would organize
  something by defining a data-structure or class, that's a table in
  your database.

  2) Rather than pointers, you use Foreign Key ID values to bind
  different data structures (tables) together.

  3) Most of the "pointers" or references are backwards.  Rather than
  a master record holding pointers to sub-records, in a database it is
  the sub-records that have the Foreign Keys that point back to the
  master record.



  That will get you pretty far.  I could come up with some more points,
  but that would already be getting into some pretty find details.  The
  only other thing I might add is that there are some pretty standard
  idioms in database design, like One-to-Many and Many-to-Many
  relationships, that are worth a quick web-search and doing some quick
  reading about.  Their use and purpose should be fairly obvious to the
  traditional developer, and provides more reinforcement in bridging
  your traditional programming experience to the database world.



  I also realize that if there are any traditional database data
  modelers out there (that are still reading), I know you're cringing
  pretty hard right now.  At some point, if you want to become very
  very good at database design, you do need to uncouple your database
  thinking from your programming language thinking.  Personally I think
  the bigger obstacle is "thinking in sets" but there are also a number
  of subtle things in the database design where this is true.
  
  That said, most of us on this list don't need to get that far--
  especially most of us working with SQLite.  Many of us are developers
  that have some nagging data problem that has gotten a bit out of
  hand, and we found this easy, fast, drop-in database to replace our
  ugly hacks with text files and out-of-control file formats.  I know
  there are a few people out there doing some pretty heavy-lifting
  work-- and I'm definitely not saying the folks around here aren't
  "real" database programmers-- but I'm guessing a fair number of us are
  SQLite users for secondary reasons (I'd like to hear if anyone thinks
  I'm totally off on this one).  For the majority of those people, a
  few basic rules and a few standard design idioms is all we need to
  address our needs and get ou with our "real" work, quite thankful
  that SQLite is so easy to integrate and makes so many ugly data
  management issues just disappear.  I know that's where I am.


  Anyways... I've gone on long enough.  Good luck with your design.
  Think a bit, ask good questions, and hopefully we can all see
  a different point of view and learn to see something new.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to