Re: [sqlite] Interpolation

2012-02-10 Thread Steinar Midtskogen
[Simon Slavin]

> On 10 Feb 2012, at 3:24pm, Steinar Midtskogen wrote:
>
>> I feared that.  As it is, it takes 6 seconds to do a SELECT * FROM
>> Combined LIMIT 1 ("Combined" is a view representing the merged table).
>> If I add an ORDER BY, it takes 35 seconds.
>> 
>> Any way to speed up the ordering?
>
> Are you putting the ORDER BY in the VIEW definition or the SELECT definition 
> ?  Whichever you're doing, try the other one.  Also, is there an index which 
> provides a sorted list in an order which suits your ORDER BY clause ?

I created the view this way:

CREATE VIEW Combined AS
 SELECT strftime("%Y-%m-%d %H:%M:%S", unix_time, "unixepoch") AS time, * FROM
 (SELECT unix_time FROM Voksenlia1 UNION
  SELECT unix_time FROM Voksenlia2 UNION
  SELECT unix_time FROM Voksenlia3 UNION
  SELECT unix_time FROM Voksenlia4 UNION
  SELECT unix_time FROM Voksenlia5 UNION
  SELECT unix_time FROM Voksenlia6 UNION
  SELECT unix_time FROM Voksenlia8
)
LEFT NATURAL JOIN Voksenlia1
LEFT NATURAL JOIN Voksenlia2
LEFT NATURAL JOIN Voksenlia3
LEFT NATURAL JOIN Voksenlia4
LEFT NATURAL JOIN Voksenlia5
LEFT NATURAL JOIN Voksenlia6
LEFT NATURAL JOIN Voksenlia8 ORDER BY unix_time;

All 7 tables have a PRIMARY KEY (unix_time)

I tried then this:

CREATE VIEW Combined AS
 SELECT strftime("%Y-%m-%d %H:%M:%S", unix_time, "unixepoch") AS time, * FROM
 (SELECT unix_time FROM Voksenlia1 UNION
  SELECT unix_time FROM Voksenlia2 UNION
  SELECT unix_time FROM Voksenlia3 UNION
  SELECT unix_time FROM Voksenlia4 UNION
  SELECT unix_time FROM Voksenlia5 UNION
  SELECT unix_time FROM Voksenlia6 UNION
  SELECT unix_time FROM Voksenlia8 ORDER BY unix_time
)
LEFT NATURAL JOIN Voksenlia1
LEFT NATURAL JOIN Voksenlia2
LEFT NATURAL JOIN Voksenlia3
LEFT NATURAL JOIN Voksenlia4
LEFT NATURAL JOIN Voksenlia5
LEFT NATURAL JOIN Voksenlia6
LEFT NATURAL JOIN Voksenlia8;

And I got a big speedup.  Is this what you meant?

The combined view currently has 84 columns and 2,548,717 rows, so
doing things the wrong way makes a huge impact.
-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Simon Slavin

On 10 Feb 2012, at 3:24pm, Steinar Midtskogen wrote:

> I feared that.  As it is, it takes 6 seconds to do a SELECT * FROM
> Combined LIMIT 1 ("Combined" is a view representing the merged table).
> If I add an ORDER BY, it takes 35 seconds.
> 
> Any way to speed up the ordering?

Are you putting the ORDER BY in the VIEW definition or the SELECT definition ?  
Whichever you're doing, try the other one.  Also, is there an index which 
provides a sorted list in an order which suits your ORDER BY clause ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Steinar Midtskogen
[Igor Tandetnik]

> If you need a particular order, it's best to add an explicit ORDER BY. 
> Otherwise, you are at the mercy of an implementation. Your current version of 
> SQLite chooses an execution plan that happens, by accident, to produce rows 
> in the desired order. Tomorrow you upgrade to a new version, and it chooses a 
> different execution plan that results in a different order.

I feared that.  As it is, it takes 6 seconds to do a SELECT * FROM
Combined LIMIT 1 ("Combined" is a view representing the merged table).
If I add an ORDER BY, it takes 35 seconds.

Any way to speed up the ordering?

I think I'll need the ordering to do interpolation.

-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Igor Tandetnik
Steinar Midtskogen  wrote:
> [Igor Tandetnik]
>> Try something like this:
>> 
>> select timestamp, value1, ..., value6 from
>> (select timestamp from tab1
>>  union
>>  select timestamp from tab2
>>  union
>>  select timestamp from tab3)
>> left join tab1 using (timespamp)
>> left join tab2 using (timespamp)
>> left join tab3 using (timespamp);
> 
> Wonderful!  It also eliminates the need to list all the nulls and
> values, and as a bonus it gets ordered by timestamp whereas the
> previous solution required an "ORDER BY timestamp"

If you need a particular order, it's best to add an explicit ORDER BY. 
Otherwise, you are at the mercy of an implementation. Your current version of 
SQLite chooses an execution plan that happens, by accident, to produce rows in 
the desired order. Tomorrow you upgrade to a new version, and it chooses a 
different execution plan that results in a different order.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Steinar Midtskogen
[Igor Tandetnik]

> Steinar Midtskogen  wrote:
>> 
>> Thanks, I didn't think in that simple terms. :) I think about listing
>> all the values, so I got lost.

I lost a word there: "I didn't think about listing"...

>> 
>> But what if the tables share a timestamp, then I would get, say:
>> 
>> 1328873300|1|2| | | |
>> 1328873300| | |3| | |
>> 1328873300| | | |4|5|6
>> 
>> How can that get collapsed into:
>> 
>> 1328873300|1|2|3|4|5|6
>
> Try something like this:
>
> select timestamp, value1, ..., value6 from
> (select timestamp from tab1
>  union
>  select timestamp from tab2
>  union
>  select timestamp from tab3)
> left join tab1 using (timespamp)
> left join tab2 using (timespamp)
> left join tab3 using (timespamp);

Wonderful!  It also eliminates the need to list all the nulls and
values, and as a bonus it gets ordered by timestamp whereas the
previous solution required an "ORDER BY timestamp" which made
everything slower than this solution (for some reason).

-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Igor Tandetnik
Steinar Midtskogen  wrote:
> [Igor Tandetnik]
> 
>>> timestamp|value1|value2|value3|value4|value5|value6
>>> 1328873000|1|2| |  |  |
>>> 1328873050| | |7|  |  |
>>> 1328873075| | | |10|13|16
>>> 1328873100|3|4| |  |  |
>>> 1328873150| | |8|  |  |
>>> 1328873175| | | |11|14|17
>>> 1328873200|5|6| |  |  |
>>> 1328873250| | |9|  |  |
>>> 1328873275| | | |12|15|18
>>> 
>>> But, first things first, how can I merge my tables to get the combined
>>> table with NULLs?
>> 
>> select value1, value2, null, null, null, null from tab1
>> union all
>> select null, null, value3, null, null, null from tab2
>> union all
>> select null, null, null, value4, value5, value6 from tab3;
> 
> Thanks, I didn't think in that simple terms. :) I think about listing
> all the values, so I got lost.
> 
> But what if the tables share a timestamp, then I would get, say:
> 
> 1328873300|1|2| | | |
> 1328873300| | |3| | |
> 1328873300| | | |4|5|6
> 
> How can that get collapsed into:
> 
> 1328873300|1|2|3|4|5|6

Try something like this:

select timestamp, value1, ..., value6 from
(select timestamp from tab1
 union
 select timestamp from tab2
 union
 select timestamp from tab3)
left join tab1 using (timespamp)
left join tab2 using (timespamp)
left join tab3 using (timespamp);

-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Steinar Midtskogen
[Igor Tandetnik]

>> timestamp|value1|value2|value3|value4|value5|value6
>> 1328873000|1|2| |  |  |
>> 1328873050| | |7|  |  |
>> 1328873075| | | |10|13|16
>> 1328873100|3|4| |  |  |
>> 1328873150| | |8|  |  |
>> 1328873175| | | |11|14|17
>> 1328873200|5|6| |  |  |
>> 1328873250| | |9|  |  |
>> 1328873275| | | |12|15|18
>> 
>> But, first things first, how can I merge my tables to get the combined
>> table with NULLs?
>
> select value1, value2, null, null, null, null from tab1
> union all
> select null, null, value3, null, null, null from tab2
> union all
> select null, null, null, value4, value5, value6 from tab3;

Thanks, I didn't think in that simple terms. :) I think about listing
all the values, so I got lost.

But what if the tables share a timestamp, then I would get, say:

1328873300|1|2| | | |
1328873300| | |3| | |
1328873300| | | |4|5|6

How can that get collapsed into:

1328873300|1|2|3|4|5|6

?

One way could be to fill out the missing values using interpolation
(as I would like anyway), then remove duplicate lines, but if there is
a simple way before I attempt to interpolate, I should probably
collapse first.

-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Igor Tandetnik
Steinar Midtskogen  wrote:
> Let's say I have N tables, each with a
> timestamp as primary key.  For instance:
> 
> tab1:
> timestamp|value1|value2
> 1328873000|1|2
> 1328873100|3|4
> 1328873200|5|6
> 
> tab2:
> timestamp|value3
> 1328873050|7
> 1328873150|8
> 1328873250|9
> 
> tab3:
> timestamp|value4|value5|value6
> 1328873075|10|13|16
> 1328873175|11|14|17
> 1328873275|12|15|18
> 
> So the resulting table should be:
> 
> timestamp|value1|value2|value3|value4|value5|value6
> 1328873000|1|2| |  |  |
> 1328873050| | |7|  |  |
> 1328873075| | | |10|13|16
> 1328873100|3|4| |  |  |
> 1328873150| | |8|  |  |
> 1328873175| | | |11|14|17
> 1328873200|5|6| |  |  |
> 1328873250| | |9|  |  |
> 1328873275| | | |12|15|18
> 
> But, first things first, how can I merge my tables to get the combined
> table with NULLs?

select value1, value2, null, null, null, null from tab1
union all
select null, null, value3, null, null, null from tab2
union all
select null, null, null, value4, value5, value6 from tab3;

-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-10 Thread Steinar Midtskogen
I've rethought the interpolation strategy.  It's not important to be
able to look up any timestamp, just the timestamps that actually have
values in at least one table.  Let's say I have N tables, each with a
timestamp as primary key.  For instance:

tab1:
timestamp|value1|value2
1328873000|1|2  
1328873100|3|4  
1328873200|5|6  

tab2:
timestamp|value3
1328873050|7
1328873150|8 
1328873250|9

tab3:
timestamp|value4|value5|value6
1328873075|10|13|16
1328873175|11|14|17
1328873275|12|15|18

First, I'd like to merge all tables to create one single table with
every timestamp and every value.  That is, it will have the timestamps
that I get by:

 SELECT timestamp FROM tab1 UNION
 SELECT timestamp FROM tab2 UNION
 SELECT timestamp FROM tab3;

So the resulting table should be:

timestamp|value1|value2|value3|value4|value5|value6
1328873000|1|2| |  |  |
1328873050| | |7|  |  |
1328873075| | | |10|13|16
1328873100|3|4| |  |  |
1328873150| | |8|  |  |
1328873175| | | |11|14|17
1328873200|5|6| |  |  |
1328873250| | |9|  |  |
1328873275| | | |12|15|18

The resulting table will have a lot of NULLs.  Next, I'd like to fill
out all NULLs by using linear interpolation.

But, first things first, how can I merge my tables to get the combined
table with NULLs?  I've been playing with JOIN and UNION, but I'm
afraid my SQL experience is very limited, so I got stuck at this one.

In reality I have 7 tables to be merged with a lot of columns (perhaps
100 in all) and up to a million rows, so if an SQL statement to do
this sounds unrealistic, that would be a good answer as well.

Thanks,
-- 
Steinar Midtskogen
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/02/12 01:47, Steinar Midtskogen wrote:
> OK.  But then I wont have an SQL interface for accessing interpolated 
> data.  It's acceptable.  An SQL interface is mostly a "nice to have", 
> so I could do some quick queries in SQL instead of having to write C 
> code for it.

Making SQL functions that call your C code will do the trick.

Also do you know how easy it is to add a SQL shell to your C code?  Read a
line of input and call sqlite3_complete on it.  If not complete, read
another line appending to earlier line and call sqlite3_complete again.
Rinse and repeat until you have a complete line then prepare, step and
output whatever it returns.

> So, if I go for that approach, you'd recommend that I add functions, 
> such as "reading", and if I want a mean temperature, I should add a 
> function "mean" and not try to change what AVG will do?

Functions you add to SQLite do not have to work on tables mentioned in the
query.  The doc is a little confusing:

  http://www.sqlite.org/c3ref/create_function.html

You need to implement the xFunc callback which is like main() in taking a
count of arguments and their values.  Once decoded just call your existing
C function.

Only you know the nature of missing data, how often it is missing, the
calculations you want to do and how all this affects the final conclusions
you are generating from the data.  (You could also fix the data generator
to synthesize missing values rather than trying to fix it later.)

If you are working with shorter periods (eg a day) then you can also
create temporary tables putting in the missing values.

But if you absolutely need every value to be present for arbitrary SQL
queries then virtual tables will pretty much be the only transparent thing.

Roger


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8yxmUACgkQmOOfHg372QTN/gCguW0ld/UZxFmjVcLMGFMt88ip
Hz0AoMbgJFwoNXw5DvwLFPe02WCh2Ok1
=C71Q
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Steinar Midtskogen
[Kit]

> SELECT a.temp+(b.temp-a.temp)/(b.time-a.time)*(strftime('%s','2012-02-08
> 11:37:00')-a.time) FROM
>   (select time, temp FROM tp
> WHERE strftime('%s','2012-02-08 11:37:00')*1>=time
> ORDER BY time DESC LIMIT 1) AS a,
>   (select time, temp FROM tp
> WHERE strftime('%s','2012-02-08 11:37:00')*1 ORDER BY time LIMIT 1) AS b;

Thanks!  I will try to rewrite this as a view.
-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Steinar Midtskogen
[Kevin Martin]

> For the quick and dirty solution, I think you can use something like this to 
> create your view. You would need to index time, and even with the index, I'm 
> not too sure about speed.
>
> select x1.time, x1.value, x2.time from x as x1 left join x as x2 on 
> x2.time=(select max(time) from x where time solutions require more complicated queries to account for the fact
> that interpolation is going on.

Yes, but if creating views does the queries fast enough for me, I'll
be pragmatic about this. :)

-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Kit
2012/2/8 Steinar Midtskogen :
> 1. I'd like to be able to look up any timestamp between the oldest and
> the newest in the database, and if there is no value stored for that
> timestamp, the value given should be an interpolation of the two
> closest.  So, if the table has:
>
> 1325376000 (Jan 1 2012 00:00:00 UTC) | 5.0
> 1325376300 (Jan 1 2012 00:05:00 UTC) | 10.0
>
> and I do "SELECT temp FROM temperature WHERE unix_time = 1325376120"
> (00:02:00) I should get 7.0.

create table tp (time default (datetime('%s','now')), temp real);

insert into tp values (1328700574,2.5);
insert into tp values (1328701004,4.8);
insert into tp values (1328701060,5.1);
insert into tp values (1328701093,5.2);

select datetime(time,'unixepoch'),temp from tp;

-- 2012-02-08 11:29:34|2.5
-- 2012-02-08 11:36:44|4.8
-- 2012-02-08 11:37:40|5.1
-- 2012-02-08 11:38:13|5.2

SELECT a.temp+(b.temp-a.temp)/(b.time-a.time)*(strftime('%s','2012-02-08
11:37:00')-a.time) FROM
  (select time, temp FROM tp
WHERE strftime('%s','2012-02-08 11:37:00')*1>=time
ORDER BY time DESC LIMIT 1) AS a,
  (select time, temp FROM tp
WHERE strftime('%s','2012-02-08 11:37:00')*1http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Kevin Martin
For the quick and dirty solution, I think you can use something like this to 
create your view. You would need to index time, and even with the index, I'm 
not too sure about speed.

select x1.time, x1.value, x2.time from x as x1 left join x as x2 on 
x2.time=(select max(time) from x where time Related to this thread, I wonder if it's possible to create a view
> which can give me a value from the row immediately above.  E.g. given
> the table:
> 
> unix_time  val
> --+---
> 1325376000|val1
> 1325376300|val2
> 1325376600|val3
> 1325376900|val4
> 
> (the first column is a unix timestamp and unique)
> 
> can I create a view which gives me:
> 
> unix_time  val  prev_unix_time
> --++--
> 1325376000|val1|
> 1325376300|val2|1325376000
> 1325376600|val3|1325376300
> 1325376900|val4|1325376600
> 
> Something like this will not work:
> 
> create view new as select unix_time, val, (select unix_time from old where 
> new.unix_time < old.unix_time order by unix_time desc limit 1) as 
> prev_unix_time from old;
> 
> as I can't refer to new.unix_time inside the view that defines "new".
> 
> The idea is, if this is possible, then I should be able to get my
> weighted average by something like this (not verified, but you get the
> idea):
> 
> select sum(val * (unix_time - prev_unix_time)) / sum(unix_time - 
> prev_unix_time) from new;
> 
> -- 
> Steinar
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Steinar Midtskogen
Related to this thread, I wonder if it's possible to create a view
which can give me a value from the row immediately above.  E.g. given
the table:

unix_time  val
--+---
1325376000|val1
1325376300|val2
1325376600|val3
1325376900|val4

(the first column is a unix timestamp and unique)

can I create a view which gives me:

unix_time  val  prev_unix_time
--++--
1325376000|val1|
1325376300|val2|1325376000
1325376600|val3|1325376300
1325376900|val4|1325376600

Something like this will not work:

 create view new as select unix_time, val, (select unix_time from old where 
new.unix_time < old.unix_time order by unix_time desc limit 1) as 
prev_unix_time from old;

as I can't refer to new.unix_time inside the view that defines "new".

The idea is, if this is possible, then I should be able to get my
weighted average by something like this (not verified, but you get the
idea):

 select sum(val * (unix_time - prev_unix_time)) / sum(unix_time - 
prev_unix_time) from new;

-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-08 Thread Steinar Midtskogen
[Roger Binns]

> I'd recommend you write code in your application first that knows how to
> calculate the values you want.  That way you can ensure the calculations
> are correct, you have something for test harnesses that produces "good"
> values and you have something to port to your final solution.
>
> I'd probably stop there.  Remember that your application code and the
> SQLite library are running in the same process.  It is almost certain that
> it is (better/more convenient/easier to develop and use) for this code to
> be app code than within SQLite.

OK.  But then I wont have an SQL interface for accessing interpolated
data.  It's acceptable.  An SQL interface is mostly a "nice to have",
so I could do some quick queries in SQL instead of having to write C
code for it.

> If someone will be doing queries expecting to match a row with second
> granularity then your SQLite side solutions are virtual tables and
> functions.  The former is well documented.  For the latter you can make a
> function like "reading" which behind the scenes calls your app code which
> prepares a statement, finds neighbouring readings and returns the
> interpolated result - eg `select reading("2012-01-01T012345")`

So, if I go for that approach, you'd recommend that I add functions,
such as "reading", and if I want a mean temperature, I should add a
function "mean" and not try to change what AVG will do?

Thanks,
-- 
Steinar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Interpolation

2012-02-07 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/02/12 23:14, Steinar Midtskogen wrote:
> I'm not asking the list to solve these problems for me, but it would 
> greatly help if anyone could point me to the right direction.  Which 
> approach would work?

I'd recommend you write code in your application first that knows how to
calculate the values you want.  That way you can ensure the calculations
are correct, you have something for test harnesses that produces "good"
values and you have something to port to your final solution.

I'd probably stop there.  Remember that your application code and the
SQLite library are running in the same process.  It is almost certain that
it is (better/more convenient/easier to develop and use) for this code to
be app code than within SQLite.

If you need other code to access the database and your application code
cannot be used then you can use one of the mechanisms you mentioned such
as virtual tables.  However it has been my experience that faking data
integrity for other code tends to lead to bugs and other issues in that
code because they don't realise what is going on.  (The initial programmer
might, but in 6 months someone else does maintenance.)

If someone will be doing queries expecting to match a row with second
granularity then your SQLite side solutions are virtual tables and
functions.  The former is well documented.  For the latter you can make a
function like "reading" which behind the scenes calls your app code which
prepares a statement, finds neighbouring readings and returns the
interpolated result - eg `select reading("2012-01-01T012345")`

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk8yKgAACgkQmOOfHg372QQxBgCfd07QCC8/f2caqq4d0EZLHg67
d0UAoKcm/eVV/MsUdGk+XgHnXuDWXOLJ
=OGRj
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Interpolation

2012-02-07 Thread Steinar Midtskogen
Hello

I have a large (> 1GB) collection of text files containing sensor
values associated with timestamps, and I want to switch to an sqlite
database.  For simplicity, let's assume that I have two tables, one
that stores temperatures and another that stores relative_humidity:

CREATE TABLE temperature (
  temp REAL,
  unix_time INTEGER,
  PRIMARY KEY (unix_time)
);

CREATE TABLE relative_humidity (
  rh REAL,
  unix_time INTEGER,
  PRIMARY KEY (unix_time)
);

What I need to solve is this:

1. I'd like to be able to look up any timestamp between the oldest and
the newest in the database, and if there is no value stored for that
timestamp, the value given should be an interpolation of the two
closest.  So, if the table has:

1325376000 (Jan 1 2012 00:00:00 UTC) | 5.0
1325376300 (Jan 1 2012 00:05:00 UTC) | 10.0

and I do "SELECT temp FROM temperature WHERE unix_time = 1325376120"
(00:02:00) I should get 7.0.

2. I'd like to calculate averages for certain periods.  Now, I could
use AVG if everything was stored in fixed intervals, but that's not
the case (which is also the reason why I want the interpolation above
- the user can't know what valid timestamps are).  For instance, let's
say I want the average temperature for two consecutive days.  The
first day was cold and has one value for every hour.  The second was
hot and has one value for every minute.  A plain AVG would not give me
what I'd expect, since the hot day would get far too much weight.  So
when calculating my average, I need to weigth the values depending on
the interval they'll represent.

3. Say that I want to know the dew point for a certain timestamp or
the average dew point for a whole day.  The dew point is calculated
from the temperature and the relative humidity.  The trouble is that
the two tables don't contain the same timestamps, so I can't look up
temp and rh in the tables using the timestamp and then do the
calculations (unless problem 1 has been solved).

In short, if my tables had values for every second, all these problems
would go away.  I could even use AVG and get what I wanted.  Is it
possible to create some kind of virtual table which to the user
appears to have values for every second?


I'm not asking the list to solve these problems for me, but it would
greatly help if anyone could point me to the right direction.  Which
approach would work?

Is it possible to create a VIEW to do any of this?

Could I use sqlite's virtual table functionality?

Or is it best to leave the tables as they are and do what I want to do
in C/C++, that is, to abandon the idea that I can get what I want
using regular SQL statements.

I could, of course, write a program that does all the interpolation
for every second and store the interpolated values in the database,
that would be very simple, but that would also make the database way
too large and slow.

Thanks.

-- 
Steinar Midtskogen
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users