On Nov 12, 2017 11:06 AM, "Balaji Ramanathan" <balaji.ramanat...@gmail.com>
wrote:

Hi,

    I have a table (mytable) with several numerical fields.  Let us call
them F1, F2, etc., etc.  The table also has a row number field (RowNumber)
that I make sure has incremental values from 1 through the number of rows
in the table.

    In addition to printing out the values of F1 through Fn, I also want to
print out, for each row of the table, the running sums and averages of
these fields.  There are two averages:  one that uses only the count of
non-null values of each field as the denominator and another that uses the
count of both null and non-null values of each field.  So, I wrote a query
as below to calculate and print those values:

select RowNumber,
F1,
(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF1,
(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF1, --average excluding null values
(select sum(F1) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values
F2,
(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
sumF2,
(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as
avgF2,
(select sum(F2) from mytable where mytable.RowNumber <=
T.RowNumber)/T.RowNumber as OverallAvgF2,
--Other fields, their running sums and averages go here
from mytable as T order by RowNumber

    I have about 3000 rows in my table and about 10 fields I am doing this
for.  When I select F1 through F10 alone, the query runs in negligible
time.  But when I run the above query with running sums and averages
included (total of 40 columns instead of 10 columns), the time increases to
well over 5 minutes.

    When I run the query at the SQLite command line, the first 1000 or so
records are produced very fast, but as the row number increases, I can see
SQLite struggling to produce outputs.  By the time I get to the end of the
query, the records are being produced at about a rate of only 1 or 2 per
second.

    Is there anything I can do to reduce the time taken?  Is this the most
efficient way to query for running sums and averages?  Any other ideas to
get what I need any quicker?

    The added wrinkle in all this is that mytable is actually not a table,
but a view.  F1 through F10 are computed from various tables and brought
together in this view.  So, I don't think I can create indexes on any of
these fields.  I have thought about creating a temp table out of this view
so that I can index RowNumber (I don't know whether it will help) but it
goes against my minimalist instincts.


I am *FAR* from a SQLite (or SQL) guru, but it seems obvious to me that
your calculations for sumF1, avgF1, OverallAvgF1, etc as written are
running in quadratic time based on the number of rows in table T.  Worse
yet, you're recalculating all these values entirely from scratch every time
you're recalculating them, rather than use the previously calculated value.

Is there any way you can calculate table T with just RowNumber, F1, F2,
..., F10, and then as you subsequently step through T row by row calculate
all the sums and averages as you step, saving the calculations for each row
as you calculate them in temporary variables for use in calculating the
values for the next row?  This might be doable in pure SQL code, or you
might have to do it in whatever language you're making calls to SQLite
from.  That should be linear time, at worst 2 times the number of rows in T
(once to construct T, once to step through it).

Be well.



Joseph
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to