Re: [sqlite] Running sums and averages
Memory is cheap and most servers have plenty. Processors are fast and most servers have multiple with many cores. Select the entire table of columns you need into memory. Write a little code. No it won't scale very well into millions of rows but I could easily run a test and I will bet many mnay many dollars that processing the sums in memory is orders of magnitude faster than SQL. You shouldn't even need to read the entire table (or view) into memory: just read row-by-row, and for each field, keep a running total and the count of non-NULL values. From these you can calculate your total and both types of average. Graham yes .. that is even better ! Dennis ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
OK, thanks for the short and sweet answer! I just wanted to make sure I wasn't missing something. Balaji Ramanathan > -- Forwarded message -- > From: Simon Slavin <slav...@bigfraud.org> > To: SQLite mailing list <sqlite-users@mailinglists.sqlite.org> > Cc: > Bcc: > Date: Sun, 12 Nov 2017 16:37:24 + > Subject: Re: [sqlite] Running sums and averages > > > On 12 Nov 2017, at 4:05pm, Balaji Ramanathan <balaji.ramanat...@gmail.com> > wrote: > > > Is there anything I can do to reduce the time taken? > > Do it in your favourite programming language rather than SQL. > Congratulations on finding a way of doing it in SQL, but running sums and > averages are trivial in a programming language and will be far faster and > you can be as fussy as you like about whether nulls are counted and such > things. > > Simon. > > > -- Forwarded message -- > From: "Joseph R. Justice" <jayare...@gmail.com> > > I am *FAR* from a SQLite (or SQL) guru, but it seems obvious to me that > your calculations for sumF1, avgF1, OverallAvgF1, etc as written are > running in quadratic time based on the number of rows in table T. Worse > yet, you're recalculating all these values entirely from scratch every time > you're recalculating them, rather than use the previously calculated value. > > Is there any way you can calculate table T with just RowNumber, F1, F2, > ..., F10, and then as you subsequently step through T row by row calculate > all the sums and averages as you step, saving the calculations for each row > as you calculate them in temporary variables for use in calculating the > values for the next row? This might be doable in pure SQL code, or you > might have to do it in whatever language you're making calls to SQLite > from. That should be linear time, at worst 2 times the number of rows in T > (once to construct T, once to step through it). > > Be well. > > > > Joseph > > > > -- Forwarded message -- > From: Dennis Clarke <dcla...@blastwave.org> > > < Simon correctly advised > > > Do it in your favourite programming language rather than SQL. > > Let me be even more clear : > > Memory is cheap and most servers have plenty. > > Processors are fast and most servers have multiple with many cores. > > Select the entire table of columns you need into memory. > Write a little code. > > No it won't scale very well into millions of rows but I could easily run > a test and I will bet many mnay many dollars that processing the sums in > memory is orders of magnitude faster than SQL. > > > Dennis > > ps: if your db is MySQL or Oracle db then the problem is trivial with > the C API > > > > -- Forwarded message -- > From: Keith Medcalf <kmedc...@dessus.com> > Subject: Re: [sqlite] Running sums and averages > > EXPLAIN QUERY PLAN > > is the first step. > > -- Forwarded message -- > From: Graham Holden <sql...@aldurslair.com> > > You shouldn't even need to read the entire table (or view) into > memory: just read row-by-row, and for each field, keep a running > total and the count of non-NULL values. From these you can calculate > your total and both types of average. > > Graham > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
As other folks have mentioned, doing it in an external language is going to be easiest. That being said, CTE's are almost a full language in themselves. I'm just gonna go with 1 value here for the example, but how about something like... with recursive foo (RowNumber, F1, sumF1, OverallAvgF1, avgF1, F1notNull) as( select rowNumber, F1, ifnull(F1, 0.0), ifnull(F1, 0.0), ifnull(F1, 0.0), F1 is not null from mytable where RowNumber = 1 union all select mytable.RowNumber, mytable.F1, foo.sumF1 + ifnull(mytable.F1, 0.0), (foo.sumF1 + ifnull(mytable.F1, 0.0)) / mytable.RowNumber, (foo.sumF1 + ifnull(mytable.F1, 0.0)) / (foo.F1notNull + (mytable.F1 is not null)), foo.F1notNull + (mytable.F1 is not null) from foo inner join mytable on mytable.RowNumber = foo.RowNumber + 1) select RowNumber, F1, sumF1, OverallAvgF1, avgF1 from foo; -Original Message- From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Balaji Ramanathan Sent: Sunday, November 12, 2017 11:06 AM To: sqlite-users@mailinglists.sqlite.org Subject: [sqlite] Running sums and averages Hi, I have a table (mytable) with several numerical fields. Let us call them F1, F2, etc., etc. The table also has a row number field (RowNumber) that I make sure has incremental values from 1 through the number of rows in the table. In addition to printing out the values of F1 through Fn, I also want to print out, for each row of the table, the running sums and averages of these fields. There are two averages: one that uses only the count of non-null values of each field as the denominator and another that uses the count of both null and non-null values of each field. So, I wrote a query as below to calculate and print those values: select RowNumber, F1, (select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as sumF1, (select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as avgF1, --average excluding null values (select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values F2, (select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as sumF2, (select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as avgF2, (select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber)/T.RowNumber as OverallAvgF2, --Other fields, their running sums and averages go here from mytable as T order by RowNumber I have about 3000 rows in my table and about 10 fields I am doing this for. When I select F1 through F10 alone, the query runs in negligible time. But when I run the above query with running sums and averages included (total of 40 columns instead of 10 columns), the time increases to well over 5 minutes. When I run the query at the SQLite command line, the first 1000 or so records are produced very fast, but as the row number increases, I can see SQLite struggling to produce outputs. By the time I get to the end of the query, the records are being produced at about a rate of only 1 or 2 per second. Is there anything I can do to reduce the time taken? Is this the most efficient way to query for running sums and averages? Any other ideas to get what I need any quicker? The added wrinkle in all this is that mytable is actually not a table, but a view. F1 through F10 are computed from various tables and brought together in this view. So, I don't think I can create indexes on any of these fields. I have thought about creating a temp table out of this view so that I can index RowNumber (I don't know whether it will help) but it goes against my minimalist instincts. Thank you. Balaji Ramanathan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
>>> Is there anything I can do to reduce the time taken? > > < Simon correctly advised > > > Do it in your favourite programming language rather than SQL. > Let me be even more clear : > Memory is cheap and most servers have plenty. > Processors are fast and most servers have multiple with many cores. > Select the entire table of columns you need into memory. > Write a little code. > No it won't scale very well into millions of rows but I could easily run > a test and I will bet many mnay many dollars that processing the sums in > memory is orders of magnitude faster than SQL. You shouldn't even need to read the entire table (or view) into memory: just read row-by-row, and for each field, keep a running total and the count of non-NULL values. From these you can calculate your total and both types of average. Graham ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
EXPLAIN QUERY PLAN is the first step. --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Balaji Ramanathan >Sent: Sunday, 12 November, 2017 09:06 >To: sqlite-users@mailinglists.sqlite.org >Subject: [sqlite] Running sums and averages > >Hi, > >I have a table (mytable) with several numerical fields. Let us >call >them F1, F2, etc., etc. The table also has a row number field >(RowNumber) >that I make sure has incremental values from 1 through the number of >rows >in the table. > >In addition to printing out the values of F1 through Fn, I also >want to >print out, for each row of the table, the running sums and averages >of >these fields. There are two averages: one that uses only the count >of >non-null values of each field as the denominator and another that >uses the >count of both null and non-null values of each field. So, I wrote a >query >as below to calculate and print those values: > >select RowNumber, >F1, >(select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) >as >sumF1, >(select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) >as >avgF1, --average excluding null values >(select sum(F1) from mytable where mytable.RowNumber <= >T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null >values >F2, >(select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) >as >sumF2, >(select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) >as >avgF2, >(select sum(F2) from mytable where mytable.RowNumber <= >T.RowNumber)/T.RowNumber as OverallAvgF2, >--Other fields, their running sums and averages go here >from mytable as T order by RowNumber > >I have about 3000 rows in my table and about 10 fields I am doing >this >for. When I select F1 through F10 alone, the query runs in >negligible >time. But when I run the above query with running sums and averages >included (total of 40 columns instead of 10 columns), the time >increases to >well over 5 minutes. > >When I run the query at the SQLite command line, the first 1000 >or so >records are produced very fast, but as the row number increases, I >can see >SQLite struggling to produce outputs. By the time I get to the end >of the >query, the records are being produced at about a rate of only 1 or 2 >per >second. > >Is there anything I can do to reduce the time taken? Is this the >most >efficient way to query for running sums and averages? Any other >ideas to >get what I need any quicker? > >The added wrinkle in all this is that mytable is actually not a >table, >but a view. F1 through F10 are computed from various tables and >brought >together in this view. So, I don't think I can create indexes on any >of >these fields. I have thought about creating a temp table out of this >view >so that I can index RowNumber (I don't know whether it will help) but >it >goes against my minimalist instincts. > >Thank you. > >Balaji Ramanathan >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
Is there anything I can do to reduce the time taken? 123456789+123456789+123456789+123456789+123456789+123456789+123456789+12 > < Simon correctly advised > > Do it in your favourite programming language rather than SQL. Let me be even more clear : Memory is cheap and most servers have plenty. Processors are fast and most servers have multiple with many cores. Select the entire table of columns you need into memory. Write a little code. No it won't scale very well into millions of rows but I could easily run a test and I will bet many mnay many dollars that processing the sums in memory is orders of magnitude faster than SQL. Dennis ps: if your db is MySQL or Oracle db then the problem is trivial with the C API ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
On Nov 12, 2017 11:06 AM, "Balaji Ramanathan"wrote: Hi, I have a table (mytable) with several numerical fields. Let us call them F1, F2, etc., etc. The table also has a row number field (RowNumber) that I make sure has incremental values from 1 through the number of rows in the table. In addition to printing out the values of F1 through Fn, I also want to print out, for each row of the table, the running sums and averages of these fields. There are two averages: one that uses only the count of non-null values of each field as the denominator and another that uses the count of both null and non-null values of each field. So, I wrote a query as below to calculate and print those values: select RowNumber, F1, (select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber) as sumF1, (select avg(F1) from mytable where mytable.RowNumber <= T.RowNumber) as avgF1, --average excluding null values (select sum(F1) from mytable where mytable.RowNumber <= T.RowNumber)/T.RowNumber as OverallAvgF1, --average including null values F2, (select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber) as sumF2, (select avg(F2) from mytable where mytable.RowNumber <= T.RowNumber) as avgF2, (select sum(F2) from mytable where mytable.RowNumber <= T.RowNumber)/T.RowNumber as OverallAvgF2, --Other fields, their running sums and averages go here from mytable as T order by RowNumber I have about 3000 rows in my table and about 10 fields I am doing this for. When I select F1 through F10 alone, the query runs in negligible time. But when I run the above query with running sums and averages included (total of 40 columns instead of 10 columns), the time increases to well over 5 minutes. When I run the query at the SQLite command line, the first 1000 or so records are produced very fast, but as the row number increases, I can see SQLite struggling to produce outputs. By the time I get to the end of the query, the records are being produced at about a rate of only 1 or 2 per second. Is there anything I can do to reduce the time taken? Is this the most efficient way to query for running sums and averages? Any other ideas to get what I need any quicker? The added wrinkle in all this is that mytable is actually not a table, but a view. F1 through F10 are computed from various tables and brought together in this view. So, I don't think I can create indexes on any of these fields. I have thought about creating a temp table out of this view so that I can index RowNumber (I don't know whether it will help) but it goes against my minimalist instincts. I am *FAR* from a SQLite (or SQL) guru, but it seems obvious to me that your calculations for sumF1, avgF1, OverallAvgF1, etc as written are running in quadratic time based on the number of rows in table T. Worse yet, you're recalculating all these values entirely from scratch every time you're recalculating them, rather than use the previously calculated value. Is there any way you can calculate table T with just RowNumber, F1, F2, ..., F10, and then as you subsequently step through T row by row calculate all the sums and averages as you step, saving the calculations for each row as you calculate them in temporary variables for use in calculating the values for the next row? This might be doable in pure SQL code, or you might have to do it in whatever language you're making calls to SQLite from. That should be linear time, at worst 2 times the number of rows in T (once to construct T, once to step through it). Be well. Joseph ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Running sums and averages
On 12 Nov 2017, at 4:05pm, Balaji Ramanathanwrote: > Is there anything I can do to reduce the time taken? Do it in your favourite programming language rather than SQL. Congratulations on finding a way of doing it in SQL, but running sums and averages are trivial in a programming language and will be far faster and you can be as fussy as you like about whether nulls are counted and such things. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users