Re: [GENERAL] Help needed structuring Postgresql correlation query

Tim Smith Tue, 21 Jun 2016 16:05:37 -0700

Thanks for that, looks like something to sink my teeth into !

On 21 June 2016 at 13:29, Alban Hertroys <haram...@gmail.com> wrote:
>
>> On 19 Jun 2016, at 10:58, Tim Smith <randomdev4+postg...@gmail.com> wrote:
>>
>> Hi,
>>
>> My postgresql-fu is not good enough to write a query to achieve this
>> (some may well say r is a better suited tool to achieve this !).
>>
>> I need to calculate what I would call a correlation window on a time
>> series of data, my table looks like this :
>>
>> create table data(data_date date,data_measurement numeric);
>> insert into data values('2016-01-01',16.23);
>> <etc>
>> insert into data values('2016-06-19',30.54);
>>
>> My "target sample" would be the N most recent samples in the table
>> (e.g. 20, the most recent 20 days)
>>
>> My "potential sample" would be a moving window of size N (the same
>> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and
>> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc),
>> but the "target sample" would obviously be excluded.
>>
>> The output needs to display window date range (or at least the start
>> date of the "potential sample" window) and the result
>> corr(target,potential).
>>
>> Hope that makes sense
>
> Something like this could do the trick (untested):
>
>         with recursive sample (nr, start_date) as (
>                 select 1 as nr, data_date as start_date, 
> SUM(data_measurement) as total
>                 from generate_series(0, 19) range(step)
>                 left join data on (data_date = start_date + range.step)
>
>                 union all
>
>                 select nr + 1, sample.start_date +1, SUM(data_measurement) as 
> total
>                 from sample
>                 join generate_series(0, 19) range(step)
>                 left join data on (data_date = start_date +1 + range.step)
>                 where start_date +1 +19 <= (select MAX(data_date) from data)
>                 group by 1, 2
>         )
>         select * from sample where start_date >= '2016-01-01';
>
> Not sure how best to go about parameterising sample size N, a stored function 
> seems like a good option.
>
>
> Another approach would be to move a (cumulative) window-function with 20 
> items over your data set and for each row subtract the first value of the 
> previous window from the total of the current window (that is, assuming 
> you're calculating a SUM of data_measurement for each window of 20 records).
>
> Visually that looks something like this for sample size 4:
> sample 1: (A + B + C + D)
> sample 2: (A + B + C + D) + E - A = (B + C + D + E)
> sample 3: (B + C + D + E) + F - B = (C + D + E + F)
> etc.
>
> To accomplish this, you calculate two cumulative totals (often misnamed as 
> running totals, but AFAIK that's something different), one from the start, 
> and one lagging N rows behind (you can use the lag() window function for 
> that) and subtract the two.
>
> Good luck!
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Help needed structuring Postgresql correlation query

Reply via email to