Re: [sqlite] Slow operation

Jay A. Kreibich Wed, 06 Oct 2010 17:46:14 -0700

On Tue, Oct 05, 2010 at 03:44:38PM +0100, Ian Hardingham scratched on the wall:
> Hey Jay, thanks for your feedback.
> 
> I am indeed using (several) delineated lists.  I would very much 
> appreciate your input into how bad a decision this is.


  I hesitate to call it a bad decision, so much as it is rather
  un-database like, and from that standpoint it is a poor database
  design.  It might be an acceptable application design, however.

  There are kind of two camps on this.  If you're more comfortable
  doing everything in a scripting language or client environment, and
  your data needs are fairly simple, then there isn't a big concern
  about the database design.  You're more or less using the database
  as nothing more than a fancy CSV container.  While this is, in a
  sense, missing the point on what a relational database gives you
  and throwing away a huge amount of expression and processing ability,
  sometimes that's all you need-- and if you need to make frequent
  updates and queries, there are still advantages to using a library
  like SQLite vs. an actual CSV or similar file.

  Then there is the feeling that if you're going to be using a database,
  you might as well "do it right" and actually use the database the way
  it was designed to be used.  I tend to agree with that idea,
  especially when the data needs and structures are anything other than
  the most trivial one-table design.  HOWEVER, I'm well aware that many
  programmers-- even seasoned, experienced developers-- don't have any
  kind of background with relational concepts and design.  Without a basic
  understanding of what a database is good at doing (and how to get the
  database to do such tricks), it is very difficult to design toward the
  strengths of a relational system.  You can't "do it right" if you're
  not aware what "right" is.

  I don't really have an answer to that, other than suggesting
  resources to help you learn.  Maybe it's worth your time, maybe it
  isn't.  There are many books and website tutorials on SQL and databases.
  Some of these are for the serious database professional, while some
  are very entry level.  You might enjoy something like "Using SQLite"
  <http://oreilly.com/catalog/9780596521189/>, which was specifically
  written for the experienced developer that, until this point, has
  never had a need to play around with a database.  While it covers the
  SQLite API and the SQL language, it also tries to provide a brief
  "get you started" introduction to database design and the basic
  "design patterns" used in most designs.  (I should also point out
  that I'm somewhat biased in my preference for this book, given
  that I wrote it.)

> So, I basically need to find entrys of Table B that do not appear in 
> that list.  Obviously, it would be better to have a playedInf table and 
> do some kind of SELECT FROM B NOT IN A query.

  Returning to the problem at hand, one of the first principals of
  database design is that each attribute of each row (that is, a
  column/row "cell") should hold only one value.  (There are some that
  will argue that this "value" can be an array, but that is a point for
  the theory people to argue over, and I'm not interested in having that
  discussion.)  Having "singular" values is one of the rules for what
  is called the First Normal Form.  You can Google that for more
  information, if you don't already know what the Normal Forms are all
  about.

  A big reason for singular values is that all the built-in techniques
  for cross indexing data in one table to another table depend on the
  single values, as that allows the database to match things back up.
  This is one of the more difficult aspects of database, however... The
  concept of JOINs and how to effectively use them tend to trip
  newcomers up a bit, and if you don't understand how to use JOINs to
  bring your data back together, it isn't a big surprise you might
  not want to split your data back up.


  So, in your case, if I can guess some of the details, you've got a
  Many-to-Many relationship (i.e. people in groups; people can belong
  to more than one group, groups have more than one member).  This is
  typically done with a people table, a group table, and a "membership"
  table, known as a "link table" or "bridge table" that connects people
  to groups.  You can then use JOINs to bring all that data back
  together and get the answers you need, even if you're just doing
  equi-JOIN filtering (filtering based off JOIN values that aren't
  actually returned).  To find rows that aren't there, you can often use
  OUTER JOINs and search for NULLs where there shouldn't be any, or you
  can use other operations, such as NOT EXISTS( ).

> However, I would rather not do the processing at all.  I'm pushing off 
> the calculation to the client - and the client does not use a database 
> at all. I simply send the list as a text field and get the client to 
> sort it out, as he has plenty of processor time.  It seems to me that 
> this is quicker than making the list on the fly by doing a SELECT and 
> then a concatination from another table...
> 
> Am I completely off-base on this?

  Not at all.  It depends a lot on the data design and the application
  requirements.  Much of the big "No-SQL" movement does exactly this.
  The big secret is that most of these applications are still doing
  relational data operations, they're just pushing the manipulations
  (such as a JOIN) down into the client code, making the "database" a
  more simple level storage engine.  Many of the end results are the
  same, however.  It makes sense when performance is at an utmost
  premium, and allows greater scalability, but it also increases
  application complexity and tends to skirt some issues like
  transactional safety.  Those types of systems also trade flexibility
  in the general case for complexity and speed in the common case.

  Even if that's an appropriate application design, you might still do
  well to use a somewhat traditional database structure.  You still need
  to filter the data before you pass it off to the client, and generally
  the database is going to be better at that then you are.  Having a
  more traditional design also simplifies insertions and updates,
  which can get clumsy and complex in a text field containing a CSV
  list, or something similar.  A traditional design will also let you
  prototype and test using the database in a full SQL style, and then
  optimize the common-case queries, perhaps pulling some of the
  processing or business logic back down into the application.

  Which is "right" depends on what you're trying to do, and the
  compromises and motivations you need to make along the way.

  Kind of like all software engineering.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Slow operation

Reply via email to