Re: [sqlite] Interpolation
[Simon Slavin] > On 10 Feb 2012, at 3:24pm, Steinar Midtskogen wrote: > >> I feared that. As it is, it takes 6 seconds to do a SELECT * FROM >> Combined LIMIT 1 ("Combined" is a view representing the merged table). >> If I add an ORDER BY, it takes 35 seconds. >> >> Any way to speed up the ordering? > > Are you putting the ORDER BY in the VIEW definition or the SELECT definition > ? Whichever you're doing, try the other one. Also, is there an index which > provides a sorted list in an order which suits your ORDER BY clause ? I created the view this way: CREATE VIEW Combined AS SELECT strftime("%Y-%m-%d %H:%M:%S", unix_time, "unixepoch") AS time, * FROM (SELECT unix_time FROM Voksenlia1 UNION SELECT unix_time FROM Voksenlia2 UNION SELECT unix_time FROM Voksenlia3 UNION SELECT unix_time FROM Voksenlia4 UNION SELECT unix_time FROM Voksenlia5 UNION SELECT unix_time FROM Voksenlia6 UNION SELECT unix_time FROM Voksenlia8 ) LEFT NATURAL JOIN Voksenlia1 LEFT NATURAL JOIN Voksenlia2 LEFT NATURAL JOIN Voksenlia3 LEFT NATURAL JOIN Voksenlia4 LEFT NATURAL JOIN Voksenlia5 LEFT NATURAL JOIN Voksenlia6 LEFT NATURAL JOIN Voksenlia8 ORDER BY unix_time; All 7 tables have a PRIMARY KEY (unix_time) I tried then this: CREATE VIEW Combined AS SELECT strftime("%Y-%m-%d %H:%M:%S", unix_time, "unixepoch") AS time, * FROM (SELECT unix_time FROM Voksenlia1 UNION SELECT unix_time FROM Voksenlia2 UNION SELECT unix_time FROM Voksenlia3 UNION SELECT unix_time FROM Voksenlia4 UNION SELECT unix_time FROM Voksenlia5 UNION SELECT unix_time FROM Voksenlia6 UNION SELECT unix_time FROM Voksenlia8 ORDER BY unix_time ) LEFT NATURAL JOIN Voksenlia1 LEFT NATURAL JOIN Voksenlia2 LEFT NATURAL JOIN Voksenlia3 LEFT NATURAL JOIN Voksenlia4 LEFT NATURAL JOIN Voksenlia5 LEFT NATURAL JOIN Voksenlia6 LEFT NATURAL JOIN Voksenlia8; And I got a big speedup. Is this what you meant? The combined view currently has 84 columns and 2,548,717 rows, so doing things the wrong way makes a huge impact. -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
On 10 Feb 2012, at 3:24pm, Steinar Midtskogen wrote: > I feared that. As it is, it takes 6 seconds to do a SELECT * FROM > Combined LIMIT 1 ("Combined" is a view representing the merged table). > If I add an ORDER BY, it takes 35 seconds. > > Any way to speed up the ordering? Are you putting the ORDER BY in the VIEW definition or the SELECT definition ? Whichever you're doing, try the other one. Also, is there an index which provides a sorted list in an order which suits your ORDER BY clause ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Igor Tandetnik] > If you need a particular order, it's best to add an explicit ORDER BY. > Otherwise, you are at the mercy of an implementation. Your current version of > SQLite chooses an execution plan that happens, by accident, to produce rows > in the desired order. Tomorrow you upgrade to a new version, and it chooses a > different execution plan that results in a different order. I feared that. As it is, it takes 6 seconds to do a SELECT * FROM Combined LIMIT 1 ("Combined" is a view representing the merged table). If I add an ORDER BY, it takes 35 seconds. Any way to speed up the ordering? I think I'll need the ordering to do interpolation. -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
Steinar Midtskogenwrote: > [Igor Tandetnik] >> Try something like this: >> >> select timestamp, value1, ..., value6 from >> (select timestamp from tab1 >> union >> select timestamp from tab2 >> union >> select timestamp from tab3) >> left join tab1 using (timespamp) >> left join tab2 using (timespamp) >> left join tab3 using (timespamp); > > Wonderful! It also eliminates the need to list all the nulls and > values, and as a bonus it gets ordered by timestamp whereas the > previous solution required an "ORDER BY timestamp" If you need a particular order, it's best to add an explicit ORDER BY. Otherwise, you are at the mercy of an implementation. Your current version of SQLite chooses an execution plan that happens, by accident, to produce rows in the desired order. Tomorrow you upgrade to a new version, and it chooses a different execution plan that results in a different order. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Igor Tandetnik] > Steinar Midtskogenwrote: >> >> Thanks, I didn't think in that simple terms. :) I think about listing >> all the values, so I got lost. I lost a word there: "I didn't think about listing"... >> >> But what if the tables share a timestamp, then I would get, say: >> >> 1328873300|1|2| | | | >> 1328873300| | |3| | | >> 1328873300| | | |4|5|6 >> >> How can that get collapsed into: >> >> 1328873300|1|2|3|4|5|6 > > Try something like this: > > select timestamp, value1, ..., value6 from > (select timestamp from tab1 > union > select timestamp from tab2 > union > select timestamp from tab3) > left join tab1 using (timespamp) > left join tab2 using (timespamp) > left join tab3 using (timespamp); Wonderful! It also eliminates the need to list all the nulls and values, and as a bonus it gets ordered by timestamp whereas the previous solution required an "ORDER BY timestamp" which made everything slower than this solution (for some reason). -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
Steinar Midtskogenwrote: > [Igor Tandetnik] > >>> timestamp|value1|value2|value3|value4|value5|value6 >>> 1328873000|1|2| | | | >>> 1328873050| | |7| | | >>> 1328873075| | | |10|13|16 >>> 1328873100|3|4| | | | >>> 1328873150| | |8| | | >>> 1328873175| | | |11|14|17 >>> 1328873200|5|6| | | | >>> 1328873250| | |9| | | >>> 1328873275| | | |12|15|18 >>> >>> But, first things first, how can I merge my tables to get the combined >>> table with NULLs? >> >> select value1, value2, null, null, null, null from tab1 >> union all >> select null, null, value3, null, null, null from tab2 >> union all >> select null, null, null, value4, value5, value6 from tab3; > > Thanks, I didn't think in that simple terms. :) I think about listing > all the values, so I got lost. > > But what if the tables share a timestamp, then I would get, say: > > 1328873300|1|2| | | | > 1328873300| | |3| | | > 1328873300| | | |4|5|6 > > How can that get collapsed into: > > 1328873300|1|2|3|4|5|6 Try something like this: select timestamp, value1, ..., value6 from (select timestamp from tab1 union select timestamp from tab2 union select timestamp from tab3) left join tab1 using (timespamp) left join tab2 using (timespamp) left join tab3 using (timespamp); -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Igor Tandetnik] >> timestamp|value1|value2|value3|value4|value5|value6 >> 1328873000|1|2| | | | >> 1328873050| | |7| | | >> 1328873075| | | |10|13|16 >> 1328873100|3|4| | | | >> 1328873150| | |8| | | >> 1328873175| | | |11|14|17 >> 1328873200|5|6| | | | >> 1328873250| | |9| | | >> 1328873275| | | |12|15|18 >> >> But, first things first, how can I merge my tables to get the combined >> table with NULLs? > > select value1, value2, null, null, null, null from tab1 > union all > select null, null, value3, null, null, null from tab2 > union all > select null, null, null, value4, value5, value6 from tab3; Thanks, I didn't think in that simple terms. :) I think about listing all the values, so I got lost. But what if the tables share a timestamp, then I would get, say: 1328873300|1|2| | | | 1328873300| | |3| | | 1328873300| | | |4|5|6 How can that get collapsed into: 1328873300|1|2|3|4|5|6 ? One way could be to fill out the missing values using interpolation (as I would like anyway), then remove duplicate lines, but if there is a simple way before I attempt to interpolate, I should probably collapse first. -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
Steinar Midtskogenwrote: > Let's say I have N tables, each with a > timestamp as primary key. For instance: > > tab1: > timestamp|value1|value2 > 1328873000|1|2 > 1328873100|3|4 > 1328873200|5|6 > > tab2: > timestamp|value3 > 1328873050|7 > 1328873150|8 > 1328873250|9 > > tab3: > timestamp|value4|value5|value6 > 1328873075|10|13|16 > 1328873175|11|14|17 > 1328873275|12|15|18 > > So the resulting table should be: > > timestamp|value1|value2|value3|value4|value5|value6 > 1328873000|1|2| | | | > 1328873050| | |7| | | > 1328873075| | | |10|13|16 > 1328873100|3|4| | | | > 1328873150| | |8| | | > 1328873175| | | |11|14|17 > 1328873200|5|6| | | | > 1328873250| | |9| | | > 1328873275| | | |12|15|18 > > But, first things first, how can I merge my tables to get the combined > table with NULLs? select value1, value2, null, null, null, null from tab1 union all select null, null, value3, null, null, null from tab2 union all select null, null, null, value4, value5, value6 from tab3; -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
I've rethought the interpolation strategy. It's not important to be able to look up any timestamp, just the timestamps that actually have values in at least one table. Let's say I have N tables, each with a timestamp as primary key. For instance: tab1: timestamp|value1|value2 1328873000|1|2 1328873100|3|4 1328873200|5|6 tab2: timestamp|value3 1328873050|7 1328873150|8 1328873250|9 tab3: timestamp|value4|value5|value6 1328873075|10|13|16 1328873175|11|14|17 1328873275|12|15|18 First, I'd like to merge all tables to create one single table with every timestamp and every value. That is, it will have the timestamps that I get by: SELECT timestamp FROM tab1 UNION SELECT timestamp FROM tab2 UNION SELECT timestamp FROM tab3; So the resulting table should be: timestamp|value1|value2|value3|value4|value5|value6 1328873000|1|2| | | | 1328873050| | |7| | | 1328873075| | | |10|13|16 1328873100|3|4| | | | 1328873150| | |8| | | 1328873175| | | |11|14|17 1328873200|5|6| | | | 1328873250| | |9| | | 1328873275| | | |12|15|18 The resulting table will have a lot of NULLs. Next, I'd like to fill out all NULLs by using linear interpolation. But, first things first, how can I merge my tables to get the combined table with NULLs? I've been playing with JOIN and UNION, but I'm afraid my SQL experience is very limited, so I got stuck at this one. In reality I have 7 tables to be merged with a lot of columns (perhaps 100 in all) and up to a million rows, so if an SQL statement to do this sounds unrealistic, that would be a good answer as well. Thanks, -- Steinar Midtskogen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/02/12 01:47, Steinar Midtskogen wrote: > OK. But then I wont have an SQL interface for accessing interpolated > data. It's acceptable. An SQL interface is mostly a "nice to have", > so I could do some quick queries in SQL instead of having to write C > code for it. Making SQL functions that call your C code will do the trick. Also do you know how easy it is to add a SQL shell to your C code? Read a line of input and call sqlite3_complete on it. If not complete, read another line appending to earlier line and call sqlite3_complete again. Rinse and repeat until you have a complete line then prepare, step and output whatever it returns. > So, if I go for that approach, you'd recommend that I add functions, > such as "reading", and if I want a mean temperature, I should add a > function "mean" and not try to change what AVG will do? Functions you add to SQLite do not have to work on tables mentioned in the query. The doc is a little confusing: http://www.sqlite.org/c3ref/create_function.html You need to implement the xFunc callback which is like main() in taking a count of arguments and their values. Once decoded just call your existing C function. Only you know the nature of missing data, how often it is missing, the calculations you want to do and how all this affects the final conclusions you are generating from the data. (You could also fix the data generator to synthesize missing values rather than trying to fix it later.) If you are working with shorter periods (eg a day) then you can also create temporary tables putting in the missing values. But if you absolutely need every value to be present for arbitrary SQL queries then virtual tables will pretty much be the only transparent thing. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8yxmUACgkQmOOfHg372QTN/gCguW0ld/UZxFmjVcLMGFMt88ip Hz0AoMbgJFwoNXw5DvwLFPe02WCh2Ok1 =C71Q -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Kit] > SELECT a.temp+(b.temp-a.temp)/(b.time-a.time)*(strftime('%s','2012-02-08 > 11:37:00')-a.time) FROM > (select time, temp FROM tp > WHERE strftime('%s','2012-02-08 11:37:00')*1>=time > ORDER BY time DESC LIMIT 1) AS a, > (select time, temp FROM tp > WHERE strftime('%s','2012-02-08 11:37:00')*1 ORDER BY time LIMIT 1) AS b; Thanks! I will try to rewrite this as a view. -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Kevin Martin] > For the quick and dirty solution, I think you can use something like this to > create your view. You would need to index time, and even with the index, I'm > not too sure about speed. > > select x1.time, x1.value, x2.time from x as x1 left join x as x2 on > x2.time=(select max(time) from x where timesolutions require more complicated queries to account for the fact > that interpolation is going on. Yes, but if creating views does the queries fast enough for me, I'll be pragmatic about this. :) -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
2012/2/8 Steinar Midtskogen: > 1. I'd like to be able to look up any timestamp between the oldest and > the newest in the database, and if there is no value stored for that > timestamp, the value given should be an interpolation of the two > closest. So, if the table has: > > 1325376000 (Jan 1 2012 00:00:00 UTC) | 5.0 > 1325376300 (Jan 1 2012 00:05:00 UTC) | 10.0 > > and I do "SELECT temp FROM temperature WHERE unix_time = 1325376120" > (00:02:00) I should get 7.0. create table tp (time default (datetime('%s','now')), temp real); insert into tp values (1328700574,2.5); insert into tp values (1328701004,4.8); insert into tp values (1328701060,5.1); insert into tp values (1328701093,5.2); select datetime(time,'unixepoch'),temp from tp; -- 2012-02-08 11:29:34|2.5 -- 2012-02-08 11:36:44|4.8 -- 2012-02-08 11:37:40|5.1 -- 2012-02-08 11:38:13|5.2 SELECT a.temp+(b.temp-a.temp)/(b.time-a.time)*(strftime('%s','2012-02-08 11:37:00')-a.time) FROM (select time, temp FROM tp WHERE strftime('%s','2012-02-08 11:37:00')*1>=time ORDER BY time DESC LIMIT 1) AS a, (select time, temp FROM tp WHERE strftime('%s','2012-02-08 11:37:00')*1http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
For the quick and dirty solution, I think you can use something like this to create your view. You would need to index time, and even with the index, I'm not too sure about speed. select x1.time, x1.value, x2.time from x as x1 left join x as x2 on x2.time=(select max(time) from x where timeRelated to this thread, I wonder if it's possible to create a view > which can give me a value from the row immediately above. E.g. given > the table: > > unix_time val > --+--- > 1325376000|val1 > 1325376300|val2 > 1325376600|val3 > 1325376900|val4 > > (the first column is a unix timestamp and unique) > > can I create a view which gives me: > > unix_time val prev_unix_time > --++-- > 1325376000|val1| > 1325376300|val2|1325376000 > 1325376600|val3|1325376300 > 1325376900|val4|1325376600 > > Something like this will not work: > > create view new as select unix_time, val, (select unix_time from old where > new.unix_time < old.unix_time order by unix_time desc limit 1) as > prev_unix_time from old; > > as I can't refer to new.unix_time inside the view that defines "new". > > The idea is, if this is possible, then I should be able to get my > weighted average by something like this (not verified, but you get the > idea): > > select sum(val * (unix_time - prev_unix_time)) / sum(unix_time - > prev_unix_time) from new; > > -- > Steinar > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
Related to this thread, I wonder if it's possible to create a view which can give me a value from the row immediately above. E.g. given the table: unix_time val --+--- 1325376000|val1 1325376300|val2 1325376600|val3 1325376900|val4 (the first column is a unix timestamp and unique) can I create a view which gives me: unix_time val prev_unix_time --++-- 1325376000|val1| 1325376300|val2|1325376000 1325376600|val3|1325376300 1325376900|val4|1325376600 Something like this will not work: create view new as select unix_time, val, (select unix_time from old where new.unix_time < old.unix_time order by unix_time desc limit 1) as prev_unix_time from old; as I can't refer to new.unix_time inside the view that defines "new". The idea is, if this is possible, then I should be able to get my weighted average by something like this (not verified, but you get the idea): select sum(val * (unix_time - prev_unix_time)) / sum(unix_time - prev_unix_time) from new; -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
[Roger Binns] > I'd recommend you write code in your application first that knows how to > calculate the values you want. That way you can ensure the calculations > are correct, you have something for test harnesses that produces "good" > values and you have something to port to your final solution. > > I'd probably stop there. Remember that your application code and the > SQLite library are running in the same process. It is almost certain that > it is (better/more convenient/easier to develop and use) for this code to > be app code than within SQLite. OK. But then I wont have an SQL interface for accessing interpolated data. It's acceptable. An SQL interface is mostly a "nice to have", so I could do some quick queries in SQL instead of having to write C code for it. > If someone will be doing queries expecting to match a row with second > granularity then your SQLite side solutions are virtual tables and > functions. The former is well documented. For the latter you can make a > function like "reading" which behind the scenes calls your app code which > prepares a statement, finds neighbouring readings and returns the > interpolated result - eg `select reading("2012-01-01T012345")` So, if I go for that approach, you'd recommend that I add functions, such as "reading", and if I want a mean temperature, I should add a function "mean" and not try to change what AVG will do? Thanks, -- Steinar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Interpolation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/02/12 23:14, Steinar Midtskogen wrote: > I'm not asking the list to solve these problems for me, but it would > greatly help if anyone could point me to the right direction. Which > approach would work? I'd recommend you write code in your application first that knows how to calculate the values you want. That way you can ensure the calculations are correct, you have something for test harnesses that produces "good" values and you have something to port to your final solution. I'd probably stop there. Remember that your application code and the SQLite library are running in the same process. It is almost certain that it is (better/more convenient/easier to develop and use) for this code to be app code than within SQLite. If you need other code to access the database and your application code cannot be used then you can use one of the mechanisms you mentioned such as virtual tables. However it has been my experience that faking data integrity for other code tends to lead to bugs and other issues in that code because they don't realise what is going on. (The initial programmer might, but in 6 months someone else does maintenance.) If someone will be doing queries expecting to match a row with second granularity then your SQLite side solutions are virtual tables and functions. The former is well documented. For the latter you can make a function like "reading" which behind the scenes calls your app code which prepares a statement, finds neighbouring readings and returns the interpolated result - eg `select reading("2012-01-01T012345")` Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8yKgAACgkQmOOfHg372QQxBgCfd07QCC8/f2caqq4d0EZLHg67 d0UAoKcm/eVV/MsUdGk+XgHnXuDWXOLJ =OGRj -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Interpolation
Hello I have a large (> 1GB) collection of text files containing sensor values associated with timestamps, and I want to switch to an sqlite database. For simplicity, let's assume that I have two tables, one that stores temperatures and another that stores relative_humidity: CREATE TABLE temperature ( temp REAL, unix_time INTEGER, PRIMARY KEY (unix_time) ); CREATE TABLE relative_humidity ( rh REAL, unix_time INTEGER, PRIMARY KEY (unix_time) ); What I need to solve is this: 1. I'd like to be able to look up any timestamp between the oldest and the newest in the database, and if there is no value stored for that timestamp, the value given should be an interpolation of the two closest. So, if the table has: 1325376000 (Jan 1 2012 00:00:00 UTC) | 5.0 1325376300 (Jan 1 2012 00:05:00 UTC) | 10.0 and I do "SELECT temp FROM temperature WHERE unix_time = 1325376120" (00:02:00) I should get 7.0. 2. I'd like to calculate averages for certain periods. Now, I could use AVG if everything was stored in fixed intervals, but that's not the case (which is also the reason why I want the interpolation above - the user can't know what valid timestamps are). For instance, let's say I want the average temperature for two consecutive days. The first day was cold and has one value for every hour. The second was hot and has one value for every minute. A plain AVG would not give me what I'd expect, since the hot day would get far too much weight. So when calculating my average, I need to weigth the values depending on the interval they'll represent. 3. Say that I want to know the dew point for a certain timestamp or the average dew point for a whole day. The dew point is calculated from the temperature and the relative humidity. The trouble is that the two tables don't contain the same timestamps, so I can't look up temp and rh in the tables using the timestamp and then do the calculations (unless problem 1 has been solved). In short, if my tables had values for every second, all these problems would go away. I could even use AVG and get what I wanted. Is it possible to create some kind of virtual table which to the user appears to have values for every second? I'm not asking the list to solve these problems for me, but it would greatly help if anyone could point me to the right direction. Which approach would work? Is it possible to create a VIEW to do any of this? Could I use sqlite's virtual table functionality? Or is it best to leave the tables as they are and do what I want to do in C/C++, that is, to abandon the idea that I can get what I want using regular SQL statements. I could, of course, write a program that does all the interpolation for every second and store the interpolated values in the database, that would be very simple, but that would also make the database way too large and slow. Thanks. -- Steinar Midtskogen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users