Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Frank Millman

On Mon Sep 11, 2017 at 11:15:22AM +0200, no...@null.net wrote:
> SELECT
> acc_no,
> acc_name,
> SUM(i_90.invoice_bal) AS 90_days,
> SUM(i_current.invoice_bal) AS current
> FROM
>debtors_table
> LEFT JOIN
> invoices i_90
> ON
> i_90.debtor_id = debtors_table.id AND
> i_90.invoice_date > date_5 AND
> i_90.invoice_date <= date_4
> LEFT JOIN
> invoices i_current
> ON
> i_current.debtor_id = debtors_table.id AND
> i_current.invoice_date > date_2 AND
> i_current.invoice_date <= date_1
> 
> I should also mention for completeness the need for:
> 
> GROUP BY
> acc_no,
> acc_name
> 
> ORDER BY
> ...


Very interesting ideas. I have a lot of experimenting to do!

Thanks very much, Mark.

Frank
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread nomad
On Mon Sep 11, 2017 at 11:15:22AM +0200, no...@null.net wrote:
> SELECT
> acc_no,
> acc_name,
> SUM(i_90.invoice_bal) AS 90_days,
> SUM(i_current.invoice_bal) AS current
> FROM
>debtors_table
> LEFT JOIN
> invoices i_90
> ON
> i_90.debtor_id = debtors_table.id AND
> i_90.invoice_date > date_5 AND
> i_90.invoice_date <= date_4
> LEFT JOIN
> invoices i_current
> ON
> i_current.debtor_id = debtors_table.id AND
> i_current.invoice_date > date_2 AND
> i_current.invoice_date <= date_1

I should also mention for completeness the need for:

GROUP BY
acc_no,
acc_name

> ORDER BY
> ...

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread nomad
On Mon Sep 11, 2017 at 08:40:41AM +0200, Frank Millman wrote:
> I could structure it like this (pseudo code) -
> 
> SELECT acc_no, acc_name,
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date <= date_5) AS 120_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_5 AND invoice_date <= date_4) AS 90_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_4 AND invoice_date <= date_3) AS 60_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_3 AND invoice_date <= date_2) AS 30_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_2 AND invoice_date <= date_1) AS current
> FROM debtors_table
> 
> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

You can achieve the same result with a single table scan using CASE
statements:

SELECT
acc_no,
acc_name,
SUM(
CASE WHEN
   invoice_date <= date_5
THEN
invoice_bal
ELSE
0
END
) AS 120_days, 
SUM(
CASE WHEN
invoice_date > date_5 AND invoice_date <= date_4
THEN
invoice_bal
ELSE
0
END
) AS 90_days, 
 
-- ...
 
SUM(
CASE WHEN
invoice_date > date_2 AND invoice_date <= date_1
THEN
invoice_bal
ELSE
0
END
) AS current
FROM
   debtors_table
LEFT JOIN
invoices
ON
invoices.debtor_id = debtors_table.id

If your time periods are not exhaustive (covering all time) then you
can perhaps be more efficient again by moving the filtering into the
join clause:

SELECT
acc_no,
acc_name,
SUM(i_90.invoice_bal) AS 90_days,
SUM(i_current.invoice_bal) AS current
FROM
   debtors_table
LEFT JOIN
invoices i_90
ON
i_90.debtor_id = debtors_table.id AND
i_90.invoice_date > date_5 AND
i_90.invoice_date <= date_4
LEFT JOIN
invoices i_current
ON
i_current.debtor_id = debtors_table.id AND
i_current.invoice_date > date_2 AND
i_current.invoice_date <= date_1
ORDER BY
...

For the above an index on invoices(debtor_id,invoice_date) would be
appropriate.

Also in case you weren't aware, the SUM() function in my second example
could produce NULL values. You may be better served in that instance by
CAST(TOTAL(i_current.invoice_bal) AS INTEGER).

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Frank Millman
Clemens Ladisch wrote:

> Frank Millman wrote:

> > I changed it to use a WITH clause to make one scan of the invoice table and 
> > store the results.

> The WITH clause itself does not store anything; it's syntactic sugar [...] To 
> force the ageing results to be stored, you'd have to create a temporary table.

Thanks – I did not know that. It explains everything!
Frank

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Clemens Ladisch
Frank Millman wrote:
> SELECT acc_no, acc_name,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date <= date_5) 
> AS 120_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_5 
> AND invoice_date <= date_4) AS 90_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_4 
> AND invoice_date <= date_3) AS 60_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_3 
> AND invoice_date <= date_2) AS 30_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_2 
> AND invoice_date <= date_1) AS current
> FROM debtors_table

This query does not make sense without a GROUP BY and a correlation,
which would affect how to optimize it.

> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

If there is an index on the invoice_date, it's efficient.

Assuming that there is an account_numer filter you did not show, the
index should look like this:

  CREATE INDEX invoices_accno_date_idx ON invoices(account_number, 
invoice_date);

Alternatively, to create a covering index 
(http://www.sqlite.org/queryplanner.html#covidx)
for this query, add the invoice_bal column.

> I changed it to use a WITH clause to make one scan of the invoice
> table and store the results.

The WITH clause itself does not store anything; it's syntactic
sugar.

> WITH ageing AS (...)
> SELECT acc_no, acc_name,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 5) AS 120_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 4) AS 90_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 3) AS 60_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 2) AS 30_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 1) AS current
> FROM debtors_table
>
> It works, but it takes about 3.5 seconds to execute.

The ageing query is likely to be executed five times for each row in the
outer query.  (This might be efficient with the above the index.)

To force the ageing results to be stored, you'd have to create
a temporary table.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users