The fastest solution is actually the temporary table:

CREATE TEMP TABLE uniqdt AS SELECT DISTINCT calc_date FROM securities

WHERE strftime('%w', calc_date)<>'0' ORDER BY calc_date;

CREATE TEMP TABLE dtotemp AS

SELECT ud1.calc_date AS calc_date, ud2.calc_date AS tmw

FROM uniqdt AS ud1 INNER JOIN uniqdt AS ud2

ON ud2.rowid = ud1.rowid+1

ORDER BY calc_date;

DROP TABLE uniqdt;


Execution time:

Run Time: real 3.601 user 3.342178 sys 0.246156

Run Time: real 0.007 user 0.005320 sys 0.001221

Run Time: real 0.000 user 0.000218 sys 0.000047

Still Richard : How about allowing recursive aggregate queries again ?


On Sat, Sep 12, 2015 at 4:45 PM, Aurel Wisse <aurel.wisse at gmail.com> wrote:

> There is an index on securities (6.2 million rows). On the dates query
> from the with clause,
> WITH dates AS (SELECT DISTINCT calc_date FROM securities), there can be no
> index.
>
> In your example, there are 1000 rows of indexed dates. The runtime of the
> full example (four steps) is:
>
> Run Time: real 0.002 user 0.000484 sys 0.000877
>
> Run Time: real 0.004 user 0.003905 sys 0.000404
>
> Run Time: real 0.001 user 0.001174 sys 0.000259
>
> Run Time: real 0.007 user 0.006640 sys 0.000405
>
> which is instantaneous, but unfortunately not reproducible with my
> datasource. Actually, even without the index the execution time of your
> example is still very reasonable:
>
> Run Time: real 0.000 user 0.000225 sys 0.000048
>
> Run Time: real 0.004 user 0.003767 sys 0.000181
>
> Run Time: real 0.409 user 0.401034 sys 0.001259
>
>
> On Sat, Sep 12, 2015 at 4:27 PM, Richard Hipp <drh at sqlite.org> wrote:
>
>> On 9/12/15, Aurel Wisse <aurel.wisse at gmail.com> wrote:
>> > @Luuk : I tried your query and I cancelled after 9 minutes when it
>> wasn't
>> > finished.
>> >
>> > @Richard : The query isn't done after 15 minutes. I added a small
>> > modification, reinserting the WITH clause inside the recursive WITH (and
>> > using dates instead of securities in the remainder of the query) :
>>
>> Did you create the index on securities(calc_date) first, as shown in
>> my example.  I should have highlighted that point - the index is
>> important for performance.
>>
>> >
>> > CREATE TEMP TABLE offsets AS
>> >
>> >   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>> >
>> > WITH dates AS (SELECT DISTINCT calc_date FROM securities)
>> >
>> >     SELECT NULL, (SELECT min(calc_date) FROM dates)
>> > ...
>> > This is inevitable as "securities" is a table with 6.2 million rows.
>> This
>> > is the schema:
>> >
>> > CREATE TABLE security (calc_date TEXT, security_code INT, mx_full REAL,
>> > mx_adj REAL, mx_std REAL, fu_full REAL, fu_adj REAL, few_full REAL,
>> few_adj
>> > REAL);
>> >
>> >
>> > There are about 4000 distinct calc_date in securities. I want to avoid
>> an
>> > intermediate table with distinct dates. If I were to use an intermediate
>> > table, the offset would be trivial to create on an ordered version of
>> > dates, self joined with the JOIN clause d1.rowid+1=d2.rowid.
>> >
>> >
>> > The expected output is (first five rows):
>> >
>> > 2000-11-30|2000-12-01
>> >
>> > 2000-12-01|2000-12-04
>> >
>> > 2000-12-04|2000-12-05
>> >
>> > 2000-12-05|2000-12-06
>> >
>> > 2000-12-06|2000-12-07
>> >
>> >
>> > I have the impression that nontrivial SELECT queries in result columns
>> are
>> > quite expensive.
>> >
>> >
>> > The fact of the matter is that the original recursive query, up to this
>> > point, is the most efficient solution with the non recursive query
>> second.
>> > Richard: Any chance aggregate recursive queries can be reactivated ?
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Aurel
>> >
>> >
>> >
>> > On Sat, Sep 12, 2015 at 12:25 PM, Richard Hipp <drh at sqlite.org> wrote:
>> >
>> >> On 9/12/15, Aurel Wisse <aurel.wisse at gmail.com> wrote:
>> >> > On 9/11/15, Richard Hipp <drh at sqlite.org> wrote:
>> >> >
>> >> > Here is the example.
>> >>
>> >> Have you considered using code like the following instead:
>> >>
>> >> -- Create a table "securities(calc_date)" and fill it with lots
>> >> -- with lots of dates in a crazy order.
>> >> --
>> >> CREATE TABLE securities(calc_date);
>> >> WITH RECURSIVE c(x) AS (VALUES(1) UNION SELECT x+1 FROM c WHERE x<1000)
>> >> INSERT INTO securities(calc_date)
>> >>    SELECT datetime(2440587.5 + ((x*123456789012345)%16700000)/1000.0)
>> >> FROM
>> >> c;
>> >> CREATE INDEX securities_calcdate ON securities(calc_date);
>> >>
>> >>
>> >> -- Compute the "offsets(calc_date, tmw)" which has one entry for each
>> >> -- calc_date in securities and for which tmw is the first non-Sunday
>> date
>> >> -- in securities that comes after calc_date.
>> >> --
>> >> CREATE TABLE offsets AS
>> >>   WITH RECURSIVE dtoff(calc_date, tmw) AS (
>> >>     SELECT NULL, (SELECT min(calc_date) FROM securities)
>> >>     UNION
>> >>     SELECT dtoff.tmw, (SELECT min(securities.calc_date) FROM securities
>> >>                         WHERE securities.calc_date>dtoff.tmw
>> >>                           AND strftime('%w',securities.calc_date)<>'0')
>> >>       FROM dtoff
>> >>   )
>> >>   SELECT * from dtoff WHERE calc_date NOTNULL ORDER BY calc_date;
>> >> SELECT * FROM offsets;
>> >>
>> >>
>> >> --
>> >> D. Richard Hipp
>> >> drh at sqlite.org
>> >> _______________________________________________
>> >> sqlite-users mailing list
>> >> sqlite-users at mailinglists.sqlite.org
>> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >>
>> >
>> >
>> >
>> > --
>> > Aurel Wisse
>> > (514) 814-5421
>> > _______________________________________________
>> > sqlite-users mailing list
>> > sqlite-users at mailinglists.sqlite.org
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>>
>>
>> --
>> D. Richard Hipp
>> drh at sqlite.org
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> Aurel Wisse
> (514) 814-5421
>



-- 
Aurel Wisse
(514) 814-5421

Reply via email to