Re: [sqlite] Break on cumulative sum
On Sun, 3 Mar 2013 07:19:11 -0600 "Michael Black" wrote: > One thing I'll point out...the trigger solution is almost linear > performance and constant memory where the single SQL statement will > suffer notably as time goes on in both those measures. Also the > single SQL statement doesn't seem like a good idea if the data is > coming in real time. He never mentioned how big his table is going to > be. If TextField 'A' is deleted, doesn't the entire table have to be updated? Sure, that's O(n), but where n is the size of the table, not the size of the update. Acknowledged, sometimes there are reasons to store derived results. But that choice bears costs, both in updating the database and in engineering it, costs that can be avoided if they can be avoided. Knuth said premature optimization is the root of all evil, and I agree. :-) Normalized data minimize the cost of manipulation. That is why we normalize our databases. As a side-effect, normalized data permit more efficient use of memory, benefiting SELECT, too. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Break on cumulative sum
On Sun, 3 Mar 2013 18:50:35 +0100 Petite Abeille wrote: > Ah? if only? SQLite had analytical functions? oh, well.. What do you have in mind? I've benn adding some user defined functions and am thinking of creating a repository for them. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Break on cumulative sum
On Mar 3, 2013, at 2:10 PM, "James K. Lowden" wrote: > There's some cruft, too. I was only demonstrating that it could be > done. If you find a way to simplify it, you'll know you understand > it. Ah… if only… SQLite had analytical functions… oh, well.. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Break on cumulative sum
One thing I'll point out...the trigger solution is almost linear performance and constant memory where the single SQL statement will suffer notably as time goes on in both those measures. Also the single SQL statement doesn't seem like a good idea if the data is coming in real time. He never mentioned how big his table is going to be. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of James K. Lowden Sent: Sunday, March 03, 2013 7:10 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Break on cumulative sum On Sun, 3 Mar 2013 11:46:26 +0100 Gert Van Assche wrote: > My SQL knowledge is not yet good enough to understand what you > proposed, but I hope one day that I will be able to understand and > replicate myself the code you wrote. I started of with a join on the > table itself as well, but I got stuck. I'll study your code and I'll > grow. Happy to be service, Gert. The trick is always the same: get your ducks in a row and shoot. Figure out the formula that computes the answer you want, and put the constituent parts on the row, then compute. The first order of business in your case was a running sum. That's done by adding up everything "before" the current row, where "before" means "less than" something, however ordered. select a.*, sum(b.ValueField) as total from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint For each row in "a", get all the rows in "b" whose TextField is less than (or equal to, here) to a's. Here it is, live and in color: $ sqlite3 -echo db < self.sql select a.TextField , a.ValueField , a.BreakPoint , sum(b.ValueField) as total from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint ; TextField ValueField BreakPoint total -- -- -- -- A 2 10 2 B 3 10 5 C 2 10 7 D 4 10 11 E 5 10 16 F 1 10 17 G 1 10 18 H 5 10 23 I 11 10 34 J 8 10 42 K 2 10 44 That formed my building block. Your formula is almost GN = 1 + total / Breakpoint By using integer arithmetic, I could divide "total" by BreakPoint to get a multiple, which is almost what you call a GroupName, except for that "one past" rule I remarked on, where for example 11 is part of the first group. But by joining my block to itself, offset by one, we can slide the multiplier down a row. For CumulativeValue, your formula is CV = total - [max total of previous GN] That requires computing the GN, grouping on it, and finding the maximum total for it, and joining that result back to the main one, where it can be subtracted from the main total. So, yes, there are a lot of joins. But I didn't write the query top-down, and you shouldn't read it that way. Read it inside-out. Each piece can be separately evaluated, and each operand used in the calculation can be exposed in the SELECT clause. There's some cruft, too. I was only demonstrating that it could be done. If you find a way to simplify it, you'll know you understand it. Regards, --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Break on cumulative sum
On Sun, 3 Mar 2013 11:46:26 +0100 Gert Van Assche wrote: > My SQL knowledge is not yet good enough to understand what you > proposed, but I hope one day that I will be able to understand and > replicate myself the code you wrote. I started of with a join on the > table itself as well, but I got stuck. I'll study your code and I'll > grow. Happy to be service, Gert. The trick is always the same: get your ducks in a row and shoot. Figure out the formula that computes the answer you want, and put the constituent parts on the row, then compute. The first order of business in your case was a running sum. That's done by adding up everything "before" the current row, where "before" means "less than" something, however ordered. select a.*, sum(b.ValueField) as total from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint For each row in "a", get all the rows in "b" whose TextField is less than (or equal to, here) to a's. Here it is, live and in color: $ sqlite3 -echo db < self.sql select a.TextField , a.ValueField , a.BreakPoint , sum(b.ValueField) as total from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint ; TextField ValueField BreakPoint total -- -- -- -- A 2 10 2 B 3 10 5 C 2 10 7 D 4 10 11 E 5 10 16 F 1 10 17 G 1 10 18 H 5 10 23 I 11 10 34 J 8 10 42 K 2 10 44 That formed my building block. Your formula is almost GN = 1 + total / Breakpoint By using integer arithmetic, I could divide "total" by BreakPoint to get a multiple, which is almost what you call a GroupName, except for that "one past" rule I remarked on, where for example 11 is part of the first group. But by joining my block to itself, offset by one, we can slide the multiplier down a row. For CumulativeValue, your formula is CV = total - [max total of previous GN] That requires computing the GN, grouping on it, and finding the maximum total for it, and joining that result back to the main one, where it can be subtracted from the main total. So, yes, there are a lot of joins. But I didn't write the query top-down, and you shouldn't read it that way. Read it inside-out. Each piece can be separately evaluated, and each operand used in the calculation can be exposed in the SELECT clause. There's some cruft, too. I was only demonstrating that it could be done. If you find a way to simplify it, you'll know you understand it. Regards, --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Break on cumulative sum
James, I'm very impressed by your solution. My SQL knowledge is not yet good enough to understand what you proposed, but I hope one day that I will be able to understand and replicate myself the code you wrote. I started of with a join on the table itself as well, but I got stuck. I'll study your code and I'll grow. That is the beauty of groups like this: hobby developers like me get advice from experts on problems we're facing. Very helpful, not just the code, but also the on-problem training! Thanks for your work, all! gert 2013/3/3 James K. Lowden > On Sat, 2 Mar 2013 14:46:40 +0100 > Gert Van Assche wrote: > > > All, I don't know how to achieve this: I need to put the cumulative > > sum in a field, and create a group as soon as that cumulative sum is > > over a breakpoint value (10). > > This was a bit of a challenge because the group definition is IMO a > little strange: The first value *after* the breakpoint is included in > the group. That made the SQL more convoluted than it would be > otherwise. > > That said, the attached SQL does the job in one statement. No triggers > are needed. > > $ sqlite3 db < running_sum.sql > TextField ValueField CumulativeValue BreakPoint GroupName > -- -- --- -- -- > A 2 210 1 > B 3 510 1 > C 2 710 1 > D 4 11 10 1 > E 5 510 2 > F 1 610 2 > G 1 710 2 > H 5 12 10 2 > I 11 11 10 3 > J 8 810 4 > K 2 10 10 4 > > This lets you define the results in terms of the base data instead of > keeping a table of derived values. No updates required, just a view. > Unless you have data-warehouse rowcounts or a very limited machine, it > should perform just fine, despite what one might guess, because the I/O > is limited to one table. It should also outperform any per-row > solution. > > Two bits of advice if I may. > > Be careful of using triggers to maintain data consistency. Triggers > are very good for enforcing referential integrity in situations that > DRI can't handle. Beyond that they get very complex and error prone. > > Using triggers to compute derived values signals redundancy in the > database. That redundancy often is not as desirable as the > denormalize-for-performance crowd thinks. Better to compute derived > values on demand (as in a view). If that's demonstrated to be too > slow, a periodic update in a cron job or similar should provide better > thoughput than recomputing on every insert/update/delete. You'll also > do yourself a favor by segregating derived values in a different table > that can be regenerated at will. > > My other suggestion is to eschew abstract nouns in column names. Words > like data, field, and value add no meaning. After all, it's a > database. If it's not a value, what is it? > > This table, > > CREATE TABLE [Test] > ( Name > , Value > , RunningSum > , Breakpoint /* (one word, small 'p') */ DEFAULT 10 > , GroupName > , primary key (Name) > ); > > is at least as clear, wouldn't you say? > > HTH. > > --jkl > > P.S. In case the listserv strips attachments, here it is in plain > text. > > [SQL] > select A.TextField, A.ValueField >, A.total - coalesce(C.GroupTotal, 0) as CumulativeValue >, A.Breakpoint > -- , coalesce(C.GroupTotal, 0) as GroupTotal >, coalesce(B.GroupName, A.GroupName) as GroupName > from ( > select a.*, sum(b.ValueField) as total >, 1 + sum(b.ValueField) / (1+a.Breakpoint) as GroupName > from Test as a join Test as b > on a.TextField >= b.TextField > group by a.TextField > , a.ValueField > , a.CumulativeValue > , a.BreakPoint > ) as A > left join ( > select a.*, sum(b.ValueField) as total >, 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName > from Test as a join Test as b > on a.TextField >= b.TextField > group by a.TextField > , a.ValueField > , a.CumulativeValue > , a.BreakPoint > ) as B > on B.TextField = (select max(TextField) from Test where TextField < > A.TextField) left join ( > select1 + T.GroupName as GroupName > , max(T.total) as GroupTotal > from ( > selectA.TextField > , A.ValueField > , A.total > , coalesce(B.GroupName, 1) as GroupName > from (
Re: [sqlite] Break on cumulative sum
On Sat, 2 Mar 2013 14:46:40 +0100 Gert Van Assche wrote: > All, I don't know how to achieve this: I need to put the cumulative > sum in a field, and create a group as soon as that cumulative sum is > over a breakpoint value (10). This was a bit of a challenge because the group definition is IMO a little strange: The first value *after* the breakpoint is included in the group. That made the SQL more convoluted than it would be otherwise. That said, the attached SQL does the job in one statement. No triggers are needed. $ sqlite3 db < running_sum.sql TextField ValueField CumulativeValue BreakPoint GroupName -- -- --- -- -- A 2 210 1 B 3 510 1 C 2 710 1 D 4 11 10 1 E 5 510 2 F 1 610 2 G 1 710 2 H 5 12 10 2 I 11 11 10 3 J 8 810 4 K 2 10 10 4 This lets you define the results in terms of the base data instead of keeping a table of derived values. No updates required, just a view. Unless you have data-warehouse rowcounts or a very limited machine, it should perform just fine, despite what one might guess, because the I/O is limited to one table. It should also outperform any per-row solution. Two bits of advice if I may. Be careful of using triggers to maintain data consistency. Triggers are very good for enforcing referential integrity in situations that DRI can't handle. Beyond that they get very complex and error prone. Using triggers to compute derived values signals redundancy in the database. That redundancy often is not as desirable as the denormalize-for-performance crowd thinks. Better to compute derived values on demand (as in a view). If that's demonstrated to be too slow, a periodic update in a cron job or similar should provide better thoughput than recomputing on every insert/update/delete. You'll also do yourself a favor by segregating derived values in a different table that can be regenerated at will. My other suggestion is to eschew abstract nouns in column names. Words like data, field, and value add no meaning. After all, it's a database. If it's not a value, what is it? This table, CREATE TABLE [Test] ( Name , Value , RunningSum , Breakpoint /* (one word, small 'p') */ DEFAULT 10 , GroupName , primary key (Name) ); is at least as clear, wouldn't you say? HTH. --jkl P.S. In case the listserv strips attachments, here it is in plain text. [SQL] select A.TextField, A.ValueField , A.total - coalesce(C.GroupTotal, 0) as CumulativeValue , A.Breakpoint -- , coalesce(C.GroupTotal, 0) as GroupTotal , coalesce(B.GroupName, A.GroupName) as GroupName from ( select a.*, sum(b.ValueField) as total , 1 + sum(b.ValueField) / (1+a.Breakpoint) as GroupName from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint ) as A left join ( select a.*, sum(b.ValueField) as total , 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint ) as B on B.TextField = (select max(TextField) from Test where TextField < A.TextField) left join ( select1 + T.GroupName as GroupName , max(T.total) as GroupTotal from ( selectA.TextField , A.ValueField , A.total , coalesce(B.GroupName, 1) as GroupName from ( select a.*, sum(b.ValueField) as total , 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName from Test as a join Test as b on a.TextField >= b.TextField group by a.TextField , a.ValueField , a.CumulativeValue , a.BreakPoint ) as A left join ( select a.*, sum(b.ValueField) as total , 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupN