[sqlite] The Simplest SQLite Common Table Expression Tutorial

Gert Van Assche Mon, 28 Sep 2015 22:46:30 +0200

You're a great teacher, David. Thanks for sharing this!

gert


2015-09-25 22:17 GMT+02:00 David Barrett <dbarrett at expensify.com>:

> Hey all, just wanted to share this in case anybody is also looking for a
> very simple tutorial for CTE's in sqlite:
>
>
> http://blog.expensify.com/2015/09/25/the-simplest-sqlite-common-table-expression-tutorial/
>
> The Simplest SQLite Common Table Expression Tutorial
>
> I?ve been trying to wrap my head aroundCommon Table Expressions
> <https://www.sqlite.org/lang_with.html> for a while, and all the tutorials
> I?ve read started out with ?simple? examples that were way too advanced for
> me to follow. Here?s my attempt to write a tutorial that starts as simple
> as possible.
>
> First, let?s start with the simplest query:
>
> sqlite> SELECT 1;
> 1
> sqlite>
>
> All this does is return a result set containing a row. Next, consider the
> simplest subquery:
>
> sqlite> SELECT * FROM ( SELECT 1 );
> 1
> sqlite>
>
> This just selects all the results from the subquery ? which in this case,
> is just a single row. A ?Common Table Expression? is basically the same as
> a subquery, except assigned a name and defined prior to the query in which
> it?s referenced. Accordingly, the simplest CTE version of the above query
> would be like:
>
> sqlite> WITH one AS ( SELECT 1 )
>         SELECT * FROM one;
> 1
> sqlite>
>
> Breaking that down a bit further:
>
>    - We?ve defined a common table expression named ?one?
>    - We?ve ?filled? it with the output of SELECT 1, which is just 1 row
>    - Then we selected everything from ?one?
>    - Such that the final result is a single value: 1
>
> But a CTE can have multiple columns, too, and those columns can be assigned
> names:
>
> sqlite> WITH twoCol( a, b ) AS ( SELECT 1, 2 )
>         SELECT a, b FROM twoCol;
> 1|2
> sqlite>
>
> Similarly, a CTE can query other tables:
>
> sqlite> CREATE TABLE foo ( bar INTEGER );
> sqlite> INSERT INTO foo VALUES(1);
> sqlite> INSERT INTO foo VALUES(2);
> sqlite> SELECT * FROM foo;
> 1
> 2
> sqlite> WITH fooCTE AS (SELECT * FROM foo)
>         SELECT * FROM fooCTE;
> 1
> 2
> sqlite>
>
> Additionally, you can define as many CTEs as you want in a single query:
>
> sqlite> WITH aCTE AS (SELECT 'a'),
>              bCTE AS (SELECT 'b')
>         SELECT * FROM aCTE, bCTE;
> a|b
> sqlite>
>
> So, common table expressions can be used to restructure a query to make it
> more readable, by moving the subqueries out in front. But the real power of
> common table expressions is when you define an expression that recursively
> selects itself. They key to this is using a ?Compound Select Statements?,
> such as the UNION ALL operator. This just combines two result sets into one
> (so long as they have the same number of columns):
>
> sqlite> SELECT 1, 2
>             UNION ALL
>         SELECT 3, 4;
> 1|2
> 3|4
> sqlite>
>
> Take this example:
>
> sqlite> WITH RECURSIVE infinite AS (
>             SELECT 1
>                 UNION ALL
>             SELECT * FROM infinite
>         )
>         SELECT * FROM infinite;
> ^CError: interrupted
> sqlite>
>
> Let?s break down why that query will never finish:
>
>    - ?WITH RECURSIVE infinite? defines a common table expression named
>    ?infinite?
>    - ?SELECT 1? seeds that CTE?s output with a single row ? containing ?1?
>    - Next the ?UNION ALL? says ?combine the output of what?s on the left,
>    with the output of what?s on the right
>    - And on the right we do ?SELECT * FROM infinite? ? meaning, select
>    everything currently in the table.
>    - The result is we?re defining a common table expression named
>    ?infinite? to be the union of ?a single row? and ?all other rows?.
>    - Because no ?cap? has been placed on this (via a WHERE or LIMIT), this
>    means we?ve defined an infinitely recurring CTE. Fun!
>
> So we can ?cap? that CTE by writing a query like:
>
> sqlite> WITH RECURSIVE finite AS (
>             SELECT 1
>                 UNION ALL
>             SELECT * FROM finite LIMIT 2
>         )
>         SELECT * FROM finite;
> 1
> 1
> sqlite>
>
> This does the same basic thing, but we?ve limited the number of possible
> results to only be 2. Ok, so that?s all well and good, but what is this
> good for? It turns out, a lot. Say you wanted to generate a table on the
> fly containing the numbers one through ten:
>
> sqlite> WITH RECURSIVE ten(x) AS (
>             SELECT 1
>                 UNION ALL
>             SELECT x+1 FROM ten WHERE x<10
>         )
>         SELECT * FROM ten;
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> sqlite>
>
> To do this, we?ve defined a CTE named ?ten?, with a single column named ?x?
> (the column name is optional, but in this case we need it to refer to
> later). Then in the recursive UNION ALL, we keep adding one more row to the
> result set ? each one larger than the row before ? until we reach a limit
> of 10.
>
> So CTEs can be used to generate a wide array of different types of data ?
> such as date ranges, perhaps to join against when doing a historical
> analysis against a sparse dataset (where some months have no data, so a
> simple group-by won?t suffice):
>
> sqlite> WITH RECURSIVE dates(x) AS (
>             SELECT '2015-01-01'
>                 UNION ALL
>             SELECT DATE(x, '+1 MONTHS') FROM dates WHERE x<'2016-01-01'
>         )
>         SELECT * FROM dates;
> 2015-01-01
> 2015-02-01
> 2015-03-01
> 2015-04-01
> 2015-05-01
> 2015-06-01
> 2015-07-01
> 2015-08-01
> 2015-09-01
> 2015-10-01
> 2015-11-01
> 2015-12-01
> 2016-01-01
> sqlite>
>
> But recursive CTEs can do more than just generate data. They?re essentially
> a powerful programming language that can be used to perform complex
> transformations on data ? such as converting a comma-separated list into a
> table that can be joined against:
>
> sqlite> WITH RECURSIVE list( element, remainder ) AS (
>             SELECT NULL AS element, '1,2,3,4,5' AS remainder
>                 UNION ALL
>             SELECT
>                 CASE
>                     WHEN INSTR( remainder, ',' )>0 THEN
>                         SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
>                     ELSE
>                         remainder
>                 END AS element,
>                 CASE
>                     WHEN INSTR( remainder, ',' )>0 THEN
>                         SUBSTR( remainder, INSTR( remainder, ',' )+1 )
>                     ELSE
>                         NULL
>                 END AS remainder
>             FROM list
>             WHERE remainder IS NOT NULL
>         )
>         SELECT element FROM list WHERE element IS NOT NULL;
> 1
> 2
> 3
> 4
> 5
> sqlite>
>
> To explain a bit more about how that one works:
>
>    - Imagine a table with two columns: element, and remainder. The
>    ?element? of each row contains a single element of the list, whereas
>    ?remainder? contains everything that comes after it.
>    - We start off by seeding the table with a single row containing a NULL
>    element, and a ?remainder? containing the list we want to transform:
>    ?1,2,3,4,5?
>    - Then we define the recursive CTE to generate a new row containing the
>    first element of the list (everything before the first comma), and the
>    remainder (everything after the first comma)
>    - Unfortunately sqlite doesn?t support the IF syntax, so I?m using the
>    slightly more confusing (but way more powerful) CASE syntax, but
> basically
>    it?s saying ?if there is a comma, then everything before is an element,
> and
>    everything after is the remainder. If there is no comma, then the whole
>    thing is the element, and there is no remainder.?
>    - Finally, it says ?keep recursing until there is no more remainder?
>    - The result is a table containing one row per element, plus one NULL
>    element (which is what we seeded the CTE?s result table with)
>    - So we select everything except for that NULL element, and voila, we?ve
>    converted a comma-separated list into a table, purely in SQL.
>
> Now that might all sound academic. But it?s not uncommon for a database to
> have some table (eg, ?nameValuePairs?) that stores arbitrary data ? and
> sometimes that data is a comma separated list. Without the CTE, you can get
> as far as obtaining that list in the form of a string, but then you need to
> post-process it with Python or some scripting language to complete your
> analysis. With this CTE, you can convert the list into a standard result
> set, and then continue your analysis with the full power of SQL ? including
> joining that result set against other tables.
>
> For example, consider if you had some Github-style commenting system, where
> users could refer to each other inline by username (eg, ?This fixes a bug
> introduced by @quinthar <https://github.com/quinthar>, thanks @genintho
> <https://github.com/genintho> for the tip!?). And imagine this were stored
> as a simple string in a ?comments? table. Normally it would be impossible
> to write a pure SQL query to determine which users were referenced inside
> the comment, so as to look up which to notify. But you could modify the CTE
> above to replace the ?comma? with an ?@?, and thus generate a list of all
> usernames mentioned in a comment. Then you could take that output and join
> against your account table to look up the email address for each of those
> users ? producing a clean list of distinct email addresses to notify.
> Indeed, you could then put that straight into an INSERT statement to insert
> a command into a ?notifications? table, queueing the email for later
> sending.
>
> So CTEs enable transformations to happen inside SQL that previously would
> force you to switch languages. We?ve already talked about list-based
> transformations. But the last I?ll discuss sounds crazy: processing
> hierarchical information inside SQL.
>
> Everybody knows relational databases are based on tables containing rows,
> and that you can join those tables together to produce ? in effect ? super
> large tables that can be analyzed as one. And many people know that you can
> join tables against themselves to do pretty tricky analyses. But without a
> CTE, at best you can join something against itself a fixed number of times
> ? however many times you write the query to do. Enter the CTE and you can
> effectively join a table against itself as many times as it takes to
> complete the analysis.
>
> Take for example, a company?s expense report approval hierarchy. This can
> be stored in a simple ?company? table, where everybody?s expense report
> ?approver? can be easily looked up by the employee?s ?name?. The approval
> structure for any specific employee is both their approver, their
> approver?s approver, and so on until you reach someone who has no approver.
> The hierarchy might have different ?depths? for different employees, so
> there?s no way to write a non-recursive query to select an employee?s
> approvers directly. But you can with a recursive CTE:
>
> sqlite> CREATE TABLE company ( name, approver );
> sqlite> INSERT INTO company VALUES ( 'David', NULL ), ( 'Matt',
> 'David' ), ( 'Jason', 'David' ), ( 'Ryan', 'David' ), ( 'Mike', 'Matt'
> ), ( 'Carlos', 'Matt' ), ( 'Garrett', 'Jason' ), ( 'Puneet', 'Jason'
> ), ( 'Joanie', 'Ryan' );
>
> sqlite> WITH RECURSIVE approvers(x) AS (
>             SELECT 'Joanie'
>                 UNION ALL
>             SELECT company.approver
>             FROM company, approvers
>             WHERE company.name=approvers.x AND company.approver IS NOT
> NULL
>         )
>         SELECT * FROM approvers;
> Joanie
> Ryan
> David
> sqlite>
>
> The upshot of all this is Common Table Expressions allow for extremely
> powerful data generation, transformation, and hierarchical querying
> capabilities ? directly from within SQL. They take a bit to wrap your head
> around, but they open up an entirely new level of potential for in-database
> analysis and logic. Enjoy!
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

[sqlite] The Simplest SQLite Common Table Expression Tutorial

Reply via email to