Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Frank Millman

On Mon Sep 11, 2017 at 11:15:22AM +0200, no...@null.net wrote:
> SELECT
> acc_no,
> acc_name,
> SUM(i_90.invoice_bal) AS 90_days,
> SUM(i_current.invoice_bal) AS current
> FROM
>debtors_table
> LEFT JOIN
> invoices i_90
> ON
> i_90.debtor_id = debtors_table.id AND
> i_90.invoice_date > date_5 AND
> i_90.invoice_date <= date_4
> LEFT JOIN
> invoices i_current
> ON
> i_current.debtor_id = debtors_table.id AND
> i_current.invoice_date > date_2 AND
> i_current.invoice_date <= date_1
> 
> I should also mention for completeness the need for:
> 
> GROUP BY
> acc_no,
> acc_name
> 
> ORDER BY
> ...


Very interesting ideas. I have a lot of experimenting to do!

Thanks very much, Mark.

Frank
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread nomad
On Mon Sep 11, 2017 at 11:15:22AM +0200, no...@null.net wrote:
> SELECT
> acc_no,
> acc_name,
> SUM(i_90.invoice_bal) AS 90_days,
> SUM(i_current.invoice_bal) AS current
> FROM
>debtors_table
> LEFT JOIN
> invoices i_90
> ON
> i_90.debtor_id = debtors_table.id AND
> i_90.invoice_date > date_5 AND
> i_90.invoice_date <= date_4
> LEFT JOIN
> invoices i_current
> ON
> i_current.debtor_id = debtors_table.id AND
> i_current.invoice_date > date_2 AND
> i_current.invoice_date <= date_1

I should also mention for completeness the need for:

GROUP BY
acc_no,
acc_name

> ORDER BY
> ...

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread nomad
On Mon Sep 11, 2017 at 08:40:41AM +0200, Frank Millman wrote:
> I could structure it like this (pseudo code) -
> 
> SELECT acc_no, acc_name,
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date <= date_5) AS 120_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_5 AND invoice_date <= date_4) AS 90_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_4 AND invoice_date <= date_3) AS 60_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_3 AND invoice_date <= date_2) AS 30_days, 
> (SELECT SUM(invoice_bal) FROM invoices
>   WHERE invoice_date > date_2 AND invoice_date <= date_1) AS current
> FROM debtors_table
> 
> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

You can achieve the same result with a single table scan using CASE
statements:

SELECT
acc_no,
acc_name,
SUM(
CASE WHEN
   invoice_date <= date_5
THEN
invoice_bal
ELSE
0
END
) AS 120_days, 
SUM(
CASE WHEN
invoice_date > date_5 AND invoice_date <= date_4
THEN
invoice_bal
ELSE
0
END
) AS 90_days, 
 
-- ...
 
SUM(
CASE WHEN
invoice_date > date_2 AND invoice_date <= date_1
THEN
invoice_bal
ELSE
0
END
) AS current
FROM
   debtors_table
LEFT JOIN
invoices
ON
invoices.debtor_id = debtors_table.id

If your time periods are not exhaustive (covering all time) then you
can perhaps be more efficient again by moving the filtering into the
join clause:

SELECT
acc_no,
acc_name,
SUM(i_90.invoice_bal) AS 90_days,
SUM(i_current.invoice_bal) AS current
FROM
   debtors_table
LEFT JOIN
invoices i_90
ON
i_90.debtor_id = debtors_table.id AND
i_90.invoice_date > date_5 AND
i_90.invoice_date <= date_4
LEFT JOIN
invoices i_current
ON
i_current.debtor_id = debtors_table.id AND
i_current.invoice_date > date_2 AND
i_current.invoice_date <= date_1
ORDER BY
...

For the above an index on invoices(debtor_id,invoice_date) would be
appropriate.

Also in case you weren't aware, the SUM() function in my second example
could produce NULL values. You may be better served in that instance by
CAST(TOTAL(i_current.invoice_bal) AS INTEGER).

-- 
Mark Lawrence
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Frank Millman
Clemens Ladisch wrote:

> Frank Millman wrote:

> > I changed it to use a WITH clause to make one scan of the invoice table and 
> > store the results.

> The WITH clause itself does not store anything; it's syntactic sugar [...] To 
> force the ageing results to be stored, you'd have to create a temporary table.

Thanks – I did not know that. It explains everything!
Frank

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Clemens Ladisch
Frank Millman wrote:
> SELECT acc_no, acc_name,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date <= date_5) 
> AS 120_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_5 
> AND invoice_date <= date_4) AS 90_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_4 
> AND invoice_date <= date_3) AS 60_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_3 
> AND invoice_date <= date_2) AS 30_days,
> (SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_2 
> AND invoice_date <= date_1) AS current
> FROM debtors_table

This query does not make sense without a GROUP BY and a correlation,
which would affect how to optimize it.

> This works, but it requires 5 separate scans of the invoice table,
> which is inefficient.

If there is an index on the invoice_date, it's efficient.

Assuming that there is an account_numer filter you did not show, the
index should look like this:

  CREATE INDEX invoices_accno_date_idx ON invoices(account_number, 
invoice_date);

Alternatively, to create a covering index 
(http://www.sqlite.org/queryplanner.html#covidx)
for this query, add the invoice_bal column.

> I changed it to use a WITH clause to make one scan of the invoice
> table and store the results.

The WITH clause itself does not store anything; it's syntactic
sugar.

> WITH ageing AS (...)
> SELECT acc_no, acc_name,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 5) AS 120_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 4) AS 90_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 3) AS 60_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 2) AS 30_days,
> (SELECT balance FROM ageing WHERE account_number = acc_no AND 
> bucket_number = 1) AS current
> FROM debtors_table
>
> It works, but it takes about 3.5 seconds to execute.

The ageing query is likely to be executed five times for each row in the
outer query.  (This might be efficient with the above the index.)

To force the ageing results to be stored, you'd have to create
a temporary table.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Selecting data from WITH clause is very slow

2017-09-11 Thread Frank Millman
Hi all

I am writing an accounting system, and want to select ‘aged balances’ from the 
debtors table.

To do this, I need to assign each outstanding invoice to an ageing ‘bucket’, 
where each bucket contains the sum of invoices where date > start_date and date 
<= end_date. There will be five buckets altogether. In my test database I have 
12 debtors and a few thousand invoices.

I could structure it like this (pseudo code) -

SELECT acc_no, acc_name,
(SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date <= date_5) AS 
120_days, 
(SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_5 AND 
invoice_date <= date_4) AS 90_days, 
(SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_4 AND 
invoice_date <= date_3) AS 60_days, 
(SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_3 AND 
invoice_date <= date_2) AS 30_days, 
(SELECT SUM(invoice_bal) FROM invoices WHERE invoice_date > date_2 AND 
invoice_date <= date_1) AS current
FROM debtors_table

This works, but it requires 5 separate scans of the invoice table, which is 
inefficient.

I changed it to use a WITH clause to make one scan of the invoice table and 
store the results.

I won’t show the WITH clause here, as it is not the reason for the question, 
and would just add clutter. If I run the WITH clause separately, it executes in 
about 0.15 seconds, and generates 60 rows with 3 columns – account number, 
bucket number, and accumulated balance.

In the main clause I now have this -

WITH ageing AS (...)
SELECT acc_no, acc_name,
(SELECT balance FROM ageing WHERE account_number = acc_no AND 
bucket_number = 5) AS 120_days,
(SELECT balance FROM ageing WHERE account_number = acc_no AND 
bucket_number = 4) AS 90_days,
(SELECT balance FROM ageing WHERE account_number = acc_no AND 
bucket_number = 3) AS 60_days,
(SELECT balance FROM ageing WHERE account_number = acc_no AND 
bucket_number = 2) AS 30_days,
(SELECT balance FROM ageing WHERE account_number = acc_no AND 
bucket_number = 1) AS current
FROM debtors_table

It works, but it takes about 3.5 seconds to execute.

Exactly the same exercise on Sql Server takes 0.17 seconds, and on PostgreSQL 
0.22 seconds.

I appreciate that the temporary table created in the WITH clause is not 
indexed, but 3.5 seconds seems a long time to extract the data from 60 rows.

Is there any way to speed this up?

Frank Millman
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users