Thanks for that, looks like something to sink my teeth into ! On 21 June 2016 at 13:29, Alban Hertroys <haram...@gmail.com> wrote: > >> On 19 Jun 2016, at 10:58, Tim Smith <randomdev4+postg...@gmail.com> wrote: >> >> Hi, >> >> My postgresql-fu is not good enough to write a query to achieve this >> (some may well say r is a better suited tool to achieve this !). >> >> I need to calculate what I would call a correlation window on a time >> series of data, my table looks like this : >> >> create table data(data_date date,data_measurement numeric); >> insert into data values('2016-01-01',16.23); >> <etc> >> insert into data values('2016-06-19',30.54); >> >> My "target sample" would be the N most recent samples in the table >> (e.g. 20, the most recent 20 days) >> >> My "potential sample" would be a moving window of size N (the same >> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and >> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc), >> but the "target sample" would obviously be excluded. >> >> The output needs to display window date range (or at least the start >> date of the "potential sample" window) and the result >> corr(target,potential). >> >> Hope that makes sense > > Something like this could do the trick (untested): > > with recursive sample (nr, start_date) as ( > select 1 as nr, data_date as start_date, > SUM(data_measurement) as total > from generate_series(0, 19) range(step) > left join data on (data_date = start_date + range.step) > > union all > > select nr + 1, sample.start_date +1, SUM(data_measurement) as > total > from sample > join generate_series(0, 19) range(step) > left join data on (data_date = start_date +1 + range.step) > where start_date +1 +19 <= (select MAX(data_date) from data) > group by 1, 2 > ) > select * from sample where start_date >= '2016-01-01'; > > Not sure how best to go about parameterising sample size N, a stored function > seems like a good option. > > > Another approach would be to move a (cumulative) window-function with 20 > items over your data set and for each row subtract the first value of the > previous window from the total of the current window (that is, assuming > you're calculating a SUM of data_measurement for each window of 20 records). > > Visually that looks something like this for sample size 4: > sample 1: (A + B + C + D) > sample 2: (A + B + C + D) + E - A = (B + C + D + E) > sample 3: (B + C + D + E) + F - B = (C + D + E + F) > etc. > > To accomplish this, you calculate two cumulative totals (often misnamed as > running totals, but AFAIK that's something different), one from the start, > and one lagging N rows behind (you can use the lag() window function for > that) and subtract the two. > > Good luck! > > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. >
-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general