[sqlite] Mozilla wiki 'avoid SQLite'

Darko Volaric Sun, 14 Jun 2015 16:25:57 -0700

The irony of your comment (which I entirely agree with) is that because
SQLite (and similar) does so much incredibly important stuff for you, it
ends up being difficult for programmers to use it, especially the lower
level the programming is. They don't understand the model of how the system
works, and how its meant to be used. Quite often they think they can put a
value in a field, and all will be well, because that's what SQL promises.
They don't think about transactions, normalization, referential integrity
or anything else, because they don't need those features. But unfortunately
if you don't then SQL will perform poorly (or incorrectly) because it isn't
a simple system to store data.


For example, I remember encountering a system where the programmers tracked
a trade though various states. Each time it changed states the system would
move (ie, insert then delete) the record from one table, into a nearly
identical table (without a transaction). They essentially used the table to
denote the state. Might make sense in an imperative model, where buffers
are common, but not at all in SQL.

All they're saying here is stop misusing SQLite: if a child cuts itself
with a knife, take away the knife.

On Sun, Jun 14, 2015 at 1:19 PM, Richard Hipp <drh at sqlite.org> wrote:

> On 6/14/15, Scott Hess <shess at google.com> wrote:
> > SQLite essentially gives you a set of
> > b-trees with syntactic sugar over them,
>
> SQL (and I speak in general terms here, not just of SQLite) provides
> way more than syntactic sugar over b-trees.  The syntactic sugar over
> b-trees part is the *least* of the many advantages of SQL.  Other more
> important features include:
>
> (1) Transactions.  Atomic updates with the option to ROLLBACK at any time.
>
> (2) Data abstraction.  The schema concisely and accurately describes
> the data format in a way that is portable across systems and across
> time.  Content stored in an SQL database is far, far more likely to be
> accessible 25 years in the future.  It is also far more likely to be
> accessible today by other applications written in different
> programming languages or implemented on obscure and/or experimental
> hardware or operating systems.
>
> (3) Declarative Programming.  With SQL, the programmer asks the
> machine a question and lets the query planner figure out an
> appropriate algorithm.  A few lines of query text replace hundreds or
> thousands of lines of procedural code needed to implement that query.
> If performance problems are encountered, they can usually be remedied
> by CREATE INDEX, and without changing a single line of code - the
> query procedures devised by the query planner shift automatically.
>
> To be sure, many programmers end up using SQL as a glorified key-value
> store.  You can actually find programs that do "SELECT * FROM table
> WHERE pk=?" on two or three tables and then implement a join as
> procedural code in the application.  Maybe this is what they were
> taught.  Or maybe they never really understood the concept of a join.
> I dunno.  But for whatever reason, you do see a lot of people treating
> SQL as little more than an interface or wire protocol for talking to a
> btree, and thereby bypassing 95% of the goodness of SQL.   But just
> because a lot of programmers do this, does not mean it is right.
>
> Alexander Lloyd (on the Google Spanner team) says that "NoSQL
> databases ... enforce a broadly applied premature optimization on the
> entire system."  What Alexander means is that there are some
> specialized cases where NoSQL-type solutions are required for
> performance, but that those situations are the exception rather than
> the rule.   It is better to use the powerful abstractions provided by
> SQL to build a reliable system, then go back and optimize any
> performance bottlenecks on a case-by-case basis.  Alexander's remarks
> implicitly but clearly reference Don Knuth's assertion that premature
> optimization is the root of all evil.
>
> Another way of thinking about the difference between SQL and NoSQL is
> by analogy to C versus assembly language.  There was a huge debate in
> the software industry in the 1980s about whether applications should
> be coded in assembly or C.  Hand-coded assembly is theoretically
> faster than machine-code generated by a C compiler (or at least it was
> in the 80s - that point is debatable now).  But in practice, programs
> written in C tended to perform better than those written in assembly.
> The reason is that C being higher level enabled programmers to spend
> less time fiddling with bits and more time developing better
> algorithms.  C enables a programmer to spend more time "heads-up"
> thinking about solving the users problem and less them "heads-down"
> worrying about the details of the implementation.  The SQL vs. NoSQL
> situation is quite similar.  You can, in theory, always run faster
> with NoSQL storage.  But in practice, SQL is so much more powerful
> that applications coded using SQL tend to perform better, be more
> reliable, and have more and useful features than those written using
> NoSQL or ad hoc storage solutions.
>
> One clear example of an application suffering from a lack of SQL is
> Git.  Git uses a NoSQL ad hoc key/value data store - files in the .git
> directory.  This is fast (they say).  But it also causes serious
> problems.  For example, finding the children of a commit involves a
> full-table scan of the Git log.  This turns out to be so difficult and
> slow that nobody implements it.  On GitHub (and on every other Git
> interface I've seen) you will find links to the parents of a check-in
> but not its children.  And so when looking at the history of a Git
> project, it is impossible to find out what happened next.  It also
> leads to the vexing problem known as "detached head state".  If the
> Git log had been implemented as an SQL database (SQLite would have
> been a good choice for this, I think, but there are other options)
> then a detached head becomes impossible - that scourge of so many
> hapless Git users simple would have never existed.  And finding (for
> example) the first 30 descendants of a check-in would become a simple
> query something like this:
>
>     WITH RECURSIVE
>       dx(h,m) AS (SELECT child, mtime FROM gitlog WHERE child=$commithash
>                       UNION
>                       SELECT child, mtime FROM gitlog, dx WHERE parent=h
>                       ORDER BY 2)
>     SELECT h, m FROM dx LIMIT 30;
>
> This query takes microseconds.  But as things stand now, doing the
> equivalent computation using the NoSQL Git repository format would
> involve thousands of lines of procedural code and even then would be
> insufferably slow, which perhaps explains why nobody has ever
> implemented it.
>
> So, no, SQL is not just syntactic sugar on a btree.  To anyone who
> thinks it is, I claim that you do not really understand SQL and I
> encourage you to abandon your preconceived notions and look more
> closely at the whole SQL concept.  I think you will find that SQL is
> an exceedingly powerful concept that, when used to its fullest, will
> make your applications faster, more reliable, and easier to write and
> maintain.
>
> --
> D. Richard Hipp
> drh at sqlite.org
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] Mozilla wiki 'avoid SQLite'

Reply via email to