You're a great teacher, David. Thanks for sharing this! gert
2015-09-25 22:17 GMT+02:00 David Barrett <dbarrett at expensify.com>: > Hey all, just wanted to share this in case anybody is also looking for a > very simple tutorial for CTE's in sqlite: > > > http://blog.expensify.com/2015/09/25/the-simplest-sqlite-common-table-expression-tutorial/ > > The Simplest SQLite Common Table Expression Tutorial > > I?ve been trying to wrap my head aroundCommon Table Expressions > <https://www.sqlite.org/lang_with.html> for a while, and all the tutorials > I?ve read started out with ?simple? examples that were way too advanced for > me to follow. Here?s my attempt to write a tutorial that starts as simple > as possible. > > First, let?s start with the simplest query: > > sqlite> SELECT 1; > 1 > sqlite> > > All this does is return a result set containing a row. Next, consider the > simplest subquery: > > sqlite> SELECT * FROM ( SELECT 1 ); > 1 > sqlite> > > This just selects all the results from the subquery ? which in this case, > is just a single row. A ?Common Table Expression? is basically the same as > a subquery, except assigned a name and defined prior to the query in which > it?s referenced. Accordingly, the simplest CTE version of the above query > would be like: > > sqlite> WITH one AS ( SELECT 1 ) > SELECT * FROM one; > 1 > sqlite> > > Breaking that down a bit further: > > - We?ve defined a common table expression named ?one? > - We?ve ?filled? it with the output of SELECT 1, which is just 1 row > - Then we selected everything from ?one? > - Such that the final result is a single value: 1 > > But a CTE can have multiple columns, too, and those columns can be assigned > names: > > sqlite> WITH twoCol( a, b ) AS ( SELECT 1, 2 ) > SELECT a, b FROM twoCol; > 1|2 > sqlite> > > Similarly, a CTE can query other tables: > > sqlite> CREATE TABLE foo ( bar INTEGER ); > sqlite> INSERT INTO foo VALUES(1); > sqlite> INSERT INTO foo VALUES(2); > sqlite> SELECT * FROM foo; > 1 > 2 > sqlite> WITH fooCTE AS (SELECT * FROM foo) > SELECT * FROM fooCTE; > 1 > 2 > sqlite> > > Additionally, you can define as many CTEs as you want in a single query: > > sqlite> WITH aCTE AS (SELECT 'a'), > bCTE AS (SELECT 'b') > SELECT * FROM aCTE, bCTE; > a|b > sqlite> > > So, common table expressions can be used to restructure a query to make it > more readable, by moving the subqueries out in front. But the real power of > common table expressions is when you define an expression that recursively > selects itself. They key to this is using a ?Compound Select Statements?, > such as the UNION ALL operator. This just combines two result sets into one > (so long as they have the same number of columns): > > sqlite> SELECT 1, 2 > UNION ALL > SELECT 3, 4; > 1|2 > 3|4 > sqlite> > > Take this example: > > sqlite> WITH RECURSIVE infinite AS ( > SELECT 1 > UNION ALL > SELECT * FROM infinite > ) > SELECT * FROM infinite; > ^CError: interrupted > sqlite> > > Let?s break down why that query will never finish: > > - ?WITH RECURSIVE infinite? defines a common table expression named > ?infinite? > - ?SELECT 1? seeds that CTE?s output with a single row ? containing ?1? > - Next the ?UNION ALL? says ?combine the output of what?s on the left, > with the output of what?s on the right > - And on the right we do ?SELECT * FROM infinite? ? meaning, select > everything currently in the table. > - The result is we?re defining a common table expression named > ?infinite? to be the union of ?a single row? and ?all other rows?. > - Because no ?cap? has been placed on this (via a WHERE or LIMIT), this > means we?ve defined an infinitely recurring CTE. Fun! > > So we can ?cap? that CTE by writing a query like: > > sqlite> WITH RECURSIVE finite AS ( > SELECT 1 > UNION ALL > SELECT * FROM finite LIMIT 2 > ) > SELECT * FROM finite; > 1 > 1 > sqlite> > > This does the same basic thing, but we?ve limited the number of possible > results to only be 2. Ok, so that?s all well and good, but what is this > good for? It turns out, a lot. Say you wanted to generate a table on the > fly containing the numbers one through ten: > > sqlite> WITH RECURSIVE ten(x) AS ( > SELECT 1 > UNION ALL > SELECT x+1 FROM ten WHERE x<10 > ) > SELECT * FROM ten; > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > sqlite> > > To do this, we?ve defined a CTE named ?ten?, with a single column named ?x? > (the column name is optional, but in this case we need it to refer to > later). Then in the recursive UNION ALL, we keep adding one more row to the > result set ? each one larger than the row before ? until we reach a limit > of 10. > > So CTEs can be used to generate a wide array of different types of data ? > such as date ranges, perhaps to join against when doing a historical > analysis against a sparse dataset (where some months have no data, so a > simple group-by won?t suffice): > > sqlite> WITH RECURSIVE dates(x) AS ( > SELECT '2015-01-01' > UNION ALL > SELECT DATE(x, '+1 MONTHS') FROM dates WHERE x<'2016-01-01' > ) > SELECT * FROM dates; > 2015-01-01 > 2015-02-01 > 2015-03-01 > 2015-04-01 > 2015-05-01 > 2015-06-01 > 2015-07-01 > 2015-08-01 > 2015-09-01 > 2015-10-01 > 2015-11-01 > 2015-12-01 > 2016-01-01 > sqlite> > > But recursive CTEs can do more than just generate data. They?re essentially > a powerful programming language that can be used to perform complex > transformations on data ? such as converting a comma-separated list into a > table that can be joined against: > > sqlite> WITH RECURSIVE list( element, remainder ) AS ( > SELECT NULL AS element, '1,2,3,4,5' AS remainder > UNION ALL > SELECT > CASE > WHEN INSTR( remainder, ',' )>0 THEN > SUBSTR( remainder, 0, INSTR( remainder, ',' ) ) > ELSE > remainder > END AS element, > CASE > WHEN INSTR( remainder, ',' )>0 THEN > SUBSTR( remainder, INSTR( remainder, ',' )+1 ) > ELSE > NULL > END AS remainder > FROM list > WHERE remainder IS NOT NULL > ) > SELECT element FROM list WHERE element IS NOT NULL; > 1 > 2 > 3 > 4 > 5 > sqlite> > > To explain a bit more about how that one works: > > - Imagine a table with two columns: element, and remainder. The > ?element? of each row contains a single element of the list, whereas > ?remainder? contains everything that comes after it. > - We start off by seeding the table with a single row containing a NULL > element, and a ?remainder? containing the list we want to transform: > ?1,2,3,4,5? > - Then we define the recursive CTE to generate a new row containing the > first element of the list (everything before the first comma), and the > remainder (everything after the first comma) > - Unfortunately sqlite doesn?t support the IF syntax, so I?m using the > slightly more confusing (but way more powerful) CASE syntax, but > basically > it?s saying ?if there is a comma, then everything before is an element, > and > everything after is the remainder. If there is no comma, then the whole > thing is the element, and there is no remainder.? > - Finally, it says ?keep recursing until there is no more remainder? > - The result is a table containing one row per element, plus one NULL > element (which is what we seeded the CTE?s result table with) > - So we select everything except for that NULL element, and voila, we?ve > converted a comma-separated list into a table, purely in SQL. > > Now that might all sound academic. But it?s not uncommon for a database to > have some table (eg, ?nameValuePairs?) that stores arbitrary data ? and > sometimes that data is a comma separated list. Without the CTE, you can get > as far as obtaining that list in the form of a string, but then you need to > post-process it with Python or some scripting language to complete your > analysis. With this CTE, you can convert the list into a standard result > set, and then continue your analysis with the full power of SQL ? including > joining that result set against other tables. > > For example, consider if you had some Github-style commenting system, where > users could refer to each other inline by username (eg, ?This fixes a bug > introduced by @quinthar <https://github.com/quinthar>, thanks @genintho > <https://github.com/genintho> for the tip!?). And imagine this were stored > as a simple string in a ?comments? table. Normally it would be impossible > to write a pure SQL query to determine which users were referenced inside > the comment, so as to look up which to notify. But you could modify the CTE > above to replace the ?comma? with an ?@?, and thus generate a list of all > usernames mentioned in a comment. Then you could take that output and join > against your account table to look up the email address for each of those > users ? producing a clean list of distinct email addresses to notify. > Indeed, you could then put that straight into an INSERT statement to insert > a command into a ?notifications? table, queueing the email for later > sending. > > So CTEs enable transformations to happen inside SQL that previously would > force you to switch languages. We?ve already talked about list-based > transformations. But the last I?ll discuss sounds crazy: processing > hierarchical information inside SQL. > > Everybody knows relational databases are based on tables containing rows, > and that you can join those tables together to produce ? in effect ? super > large tables that can be analyzed as one. And many people know that you can > join tables against themselves to do pretty tricky analyses. But without a > CTE, at best you can join something against itself a fixed number of times > ? however many times you write the query to do. Enter the CTE and you can > effectively join a table against itself as many times as it takes to > complete the analysis. > > Take for example, a company?s expense report approval hierarchy. This can > be stored in a simple ?company? table, where everybody?s expense report > ?approver? can be easily looked up by the employee?s ?name?. The approval > structure for any specific employee is both their approver, their > approver?s approver, and so on until you reach someone who has no approver. > The hierarchy might have different ?depths? for different employees, so > there?s no way to write a non-recursive query to select an employee?s > approvers directly. But you can with a recursive CTE: > > sqlite> CREATE TABLE company ( name, approver ); > sqlite> INSERT INTO company VALUES ( 'David', NULL ), ( 'Matt', > 'David' ), ( 'Jason', 'David' ), ( 'Ryan', 'David' ), ( 'Mike', 'Matt' > ), ( 'Carlos', 'Matt' ), ( 'Garrett', 'Jason' ), ( 'Puneet', 'Jason' > ), ( 'Joanie', 'Ryan' ); > > sqlite> WITH RECURSIVE approvers(x) AS ( > SELECT 'Joanie' > UNION ALL > SELECT company.approver > FROM company, approvers > WHERE company.name=approvers.x AND company.approver IS NOT > NULL > ) > SELECT * FROM approvers; > Joanie > Ryan > David > sqlite> > > The upshot of all this is Common Table Expressions allow for extremely > powerful data generation, transformation, and hierarchical querying > capabilities ? directly from within SQL. They take a bit to wrap your head > around, but they open up an entirely new level of potential for in-database > analysis and logic. Enjoy! > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >