There is an index on securities (6.2 million rows). On the dates query from the with clause, WITH dates AS (SELECT DISTINCT calc_date FROM securities), there can be no index.
In your example, there are 1000 rows of indexed dates. The runtime of the full example (four steps) is: Run Time: real 0.002 user 0.000484 sys 0.000877 Run Time: real 0.004 user 0.003905 sys 0.000404 Run Time: real 0.001 user 0.001174 sys 0.000259 Run Time: real 0.007 user 0.006640 sys 0.000405 which is instantaneous, but unfortunately not reproducible with my datasource. Actually, even without the index the execution time of your example is still very reasonable: Run Time: real 0.000 user 0.000225 sys 0.000048 Run Time: real 0.004 user 0.003767 sys 0.000181 Run Time: real 0.409 user 0.401034 sys 0.001259 On Sat, Sep 12, 2015 at 4:27 PM, Richard Hipp <drh at sqlite.org> wrote: > On 9/12/15, Aurel Wisse <aurel.wisse at gmail.com> wrote: > > @Luuk : I tried your query and I cancelled after 9 minutes when it wasn't > > finished. > > > > @Richard : The query isn't done after 15 minutes. I added a small > > modification, reinserting the WITH clause inside the recursive WITH (and > > using dates instead of securities in the remainder of the query) : > > Did you create the index on securities(calc_date) first, as shown in > my example. I should have highlighted that point - the index is > important for performance. > > > > > CREATE TEMP TABLE offsets AS > > > > WITH RECURSIVE dtoff(calc_date, tmw) AS ( > > > > WITH dates AS (SELECT DISTINCT calc_date FROM securities) > > > > SELECT NULL, (SELECT min(calc_date) FROM dates) > > ... > > This is inevitable as "securities" is a table with 6.2 million rows. This > > is the schema: > > > > CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL, > > mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL, > few_adj > > REAL); > > > > > > There are about 4000 distinct calc_date in securities. I want to avoid an > > intermediate table with distinct dates. If I were to use an intermediate > > table, the offset would be trivial to create on an ordered version of > > dates, self joined with the JOIN clause d1.rowid+1=d2.rowid. > > > > > > The expected output is (first five rows): > > > > 2000-11-30|2000-12-01 > > > > 2000-12-01|2000-12-04 > > > > 2000-12-04|2000-12-05 > > > > 2000-12-05|2000-12-06 > > > > 2000-12-06|2000-12-07 > > > > > > I have the impression that nontrivial SELECT queries in result columns > are > > quite expensive. > > > > > > The fact of the matter is that the original recursive query, up to this > > point, is the most efficient solution with the non recursive query > second. > > Richard: Any chance aggregate recursive queries can be reactivated ? > > > > > > Thanks, > > > > > > Aurel > > > > > > > > On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <drh at sqlite.org> wrote: > > > >> On 9/12/15, Aurel Wisse <aurel.wisse at gmail.com> wrote: > >> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote: > >> > > >> > Here is the example. > >> > >> Have you considered using code like the following instead: > >> > >> -- Create a table "securities(calc_date)" and fill it with lots > >> -- with lots of dates in a crazy order. > >> -- > >> CREATE TABLE securities(calc_date); > >> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000) > >> INSERT INTO securities(calc_date) > >> SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0) > >> FROM > >> c; > >> CREATE INDEX securities_calcdate ON securities(calc_date); > >> > >> > >> -- Compute the "offsets(calc_date, tmw)" which has one entry for each > >> -- calc_date in securities and for which tmw is the first non-Sunday > date > >> -- in securities that comes after calc_date. > >> -- > >> CREATE TABLE offsets AS > >> WITH RECURSIVE dtoff(calc_date, tmw) AS ( > >> SELECT NULL, (SELECT min(calc_date) FROM securities) > >> UNION > >> SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities > >> WHERE securities.calc_date>dtoff.tmw > >> AND strftime('%w',securities.calc_date)<>'0') > >> FROM dtoff > >> ) > >> SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date; > >> SELECT * FROM offsets; > >> > >> > >> -- > >> D. Richard Hipp > >> drh at sqlite.org > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users at mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > > > > > > > > -- > > Aurel Wisse > > (514) 814-5421 > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users at mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > -- > D. Richard Hipp > drh at sqlite.org > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- Aurel Wisse (514) 814-5421