Sven Anderson wrote:
> Hi all,
>
> I want to share with you an interesting experience I just had.
>
> I did the following SELECT:
>
> SELECT "ip_dst",SUM("bytes"),SUM("packets"),SUM("flows") FROM
> "tcom_v5_20060530" WHERE "stamp_inserted">='2006-05-30 00:00:00' AND
> "stamp_inserted"<'2006-05-30 02:00:00' AND "port_src"='53' AND
> "port_dst"='53' AND "ip_src"='xxx.xxx.xxx.xxx' GROUP BY "ip_dst" ORDER BY
> SUM("bytes") DESC LIMIT '10';
>
> which seemed to never end (>30 minutes).
>
> I had only two indexes on this table:
> "tcom_v5_20060530_idx" btree (ip_src, ip_dst, stamp_inserted)
> "tcom_v5_20060530_stamp_idx" btree (stamp_inserted)
>
> After dropping the tcom_v5_20060530_idx, the query was answered in 20 seconds.
>
Try run the query as EXPLAIN SELECT ... to see how it's using indexes.
I have also put some links to resources about PostgreSQL optimization here:
http://nyetwork.org/wiki/PostgreSQL
I think the (ip_src, ip_dst, stamp_inserted) index is probably so large
that it's useless - it's probably has references to every single row.
Your query is only using ip_src, so why have ip_dst in the index? Might
be better to create seperate indexes then.
If you only are doing hourly ranges in the WHERE, then you can make the
index more efficient by doing something like
tcom_v5_20060530_idx2 btree (date_trunc('hour', stamp_inserted), ip_src)
Then change the WHERE statements to use date_trunc as well.
Just some ideas, HTH!
Wim
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists