Re: [PERFORM] Calculating 95th percentiles

2011-03-31 Thread Landreville
On Sat, Mar 5, 2011 at 7:34 PM, marcin mank marcin.m...@gmail.com wrote:
 Is this fast enough on a slave:


 with deltas as (select * from get_delta_table(...)),
 p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
 select
 (select in_rate from deltas, p95 where
 in_rate_order = p95v),
 (select out_rate from deltas, p95 where
 out_rate_order = p95v)
 etc..
 Greetings
 Marcin


I really didn't know you could use a with statement on a read-only
database -- I don't think I even knew the with statement existed in
Postgres (is it documented somewhere?). I will try this out.

I am also looking at Pierre's suggestion of calculating the delta
value on insert. To do this I am going to update all the rows
currently in the partitioned tables. Does anyone know if this will
still use constraint exclusion in the correlated subquery or will it
scan every partitioned table for each updated row?:

update volume
set in_delta =  in_octets - vprev.in_octets,
out_delta =  out_octets - vprev.out_octets
from volume vprev
where vprev.insert_timestamp =
(select max(insert_timestamp) from volume  v
 where v.switch_port_id = volume.switchport_id
 andv.insert_timestamp  volume.insert_timestamp);

I suppose I can check with an analyze before I execute it (I still
have to alter the table to add the delta columns).

Thanks,

Landreville

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculating 95th percentiles

2011-03-05 Thread Pierre C



Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes.  I
have a function to do this calcuation:


Would it be possible to do this when inserting and store the deltas  
directly ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculating 95th percentiles

2011-03-05 Thread marcin mank
On Fri, Mar 4, 2011 at 4:18 PM, Landreville
landrevi...@deadtreepages.com wrote:

    create temporary table deltas on commit drop as
        select * from get_delta_table(p_switchport_id, p_start_date,
 p_end_date);

    select round(count(volume_id) * 0.95) into v_95th_row from deltas;
    select in_rate into v_record.in_95th from deltas where
 in_rate_order = v_95th_row;
    select out_rate into v_record.out_95th from deltas where
 out_rate_order = v_95th_row;
    select sum(in_delta), sum(out_delta) into v_record.in_total,
 v_record.out_total from deltas;

 Unfortunately using a temporary table means that I cannot run this
 query on the read-only slave, but I can't see a way around using one.

Is this fast enough on a slave:


with deltas as (select * from get_delta_table(...)),
p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
select
(select in_rate from deltas, p95 where
in_rate_order = p95v),
(select out_rate from deltas, p95 where
out_rate_order = p95v)
etc..

?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Calculating 95th percentiles

2011-03-04 Thread Landreville
This is not a performance bug -- my query takes a reasonably long
amount of time, but I would like to see if I can get this calculation
any faster in my setup.

I have a table:
volume_id serial primary key
switchport_id integer not null
in_octets bigint not null
out_octets bigint not null
insert_timestamp timestamp default now()
with indexes on volume_id, switchport_id, insert_timestamp.

That is partitioned into about 3000 tables by the switchport_id (FK to
a lookup table), each table has about 30 000 rows currently (a row is
inserted every 5 minutes into each table).
I have select queries that filter based on switchport_id and
timestamp. Constraint exclusion is used with the switchport_id to get
the right table and the insert_timestamp has an index on it (on each
table).

Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes.  I
have a function to do this calcuation:


create or replace function traffic.get_delta_table(p_switchport_id integer,
p_start_date date, p_end_date date)
returns table(  volume_id integer,
insert_timestamp timestamp,
out_delta bigint,
out_rate bigint,
out_rate_order bigint,
in_delta bigint,
in_rate bigint,
in_rate_order bigint)
as $$
declare
begin
-- we need to force pgsql to make a new plan for each query so it can
-- use constraint exclusions on switchport id to determine the
partition table to scan
return query execute 'select
t.volume_id,
t.insert_timestamp,
t.out_delta,
t.out_delta * 8 / t.time_difference as out_rate,
row_number() over (order by t.out_delta * 8 /
t.time_difference) as out_rate_order,
t.in_delta,
t.in_delta * 8 / t.time_difference as in_rate,
row_number() over(order by t.in_delta * 8 / t.time_difference)
as in_rate_order
from
(select
n.volume_id,
n.insert_timestamp,
extract(epoch from (n.insert_timestamp -
lag(n.insert_timestamp,1,n.insert_timestamp) over(order by
n.insert_timestamp)))::integer as time_difference,
case
when n.out_octets  lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
then n.out_octets
else n.out_octets - lag(out_octets,1,n.out_octets)
over(order by n.insert_timestamp)
end as out_delta,
case
when n.in_octets  lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
then n.in_octets
else n.in_octets - lag(in_octets,1,n.in_octets) over(order
by n.insert_timestamp)
end as in_delta
from volume as n
where n.insert_timestamp between $1 and $2
and n.switchport_id = '||p_switchport_id||'
and in_octets != 0
and out_octets != 0
) as t
where time_difference is not null and time_difference != 0' using
p_start_date, p_end_date;

end; $$ language plpgsql;

The query inside the function's plan:
 WindowAgg  (cost=2269.62..2445.35 rows=6390 width=32) (actual
time=7526.526..7531.855 rows=6622 loops=1)
   -  Sort  (cost=2269.62..2285.60 rows=6390 width=32) (actual
time=7526.497..7527.924 rows=6622 loops=1)
 Sort Key: (((t.in_delta * 8) / t.time_difference))
 Sort Method:  external sort  Disk: 432kB
 -  WindowAgg  (cost=1753.90..1865.72 rows=6390 width=32)
(actual time=2613.593..2618.727 rows=6622 loops=1)
   -  Sort  (cost=1753.90..1769.87 rows=6390 width=32)
(actual time=2613.566..2614.550 rows=6622 loops=1)
 Sort Key: (((t.out_delta * 8) / t.time_difference))
 Sort Method:  quicksort  Memory: 710kB
 -  Subquery Scan on t  (cost=978.89..1350.00
rows=6390 width=32) (actual time=2582.254..2606.708 rows=6622 loops=1)
   Filter: ((t.time_difference IS NOT NULL)
AND (t.time_difference  0))
   -  WindowAgg  (cost=978.89..1269.32
rows=6454 width=28) (actual time=2582.243..2596.546 rows=6623 loops=1)
 -  Sort  (cost=978.89..995.03
rows=6454 width=28) (actual time=2582.120..2583.172 rows=6623 loops=1)
   Sort Key: n.insert_timestamp
   Sort Method:  quicksort  Memory: 710kB
   -  Result  (cost=8.87..570.49
rows=6454 width=28) (actual time=1036.720..2576.755 rows=6623 loops=1)
 -  Append
(cost=8.87..570.49 rows=6454 width=28) (actual time=1036.718..2574.719
rows=6623 loops=1)
   -  Bitmap Heap
Scan on volume n  (cost=8.87..12.90 rows=1 width=28) (actual
time=0.055..0.055