Hi Wim,

Wim Kerkhoff, 04.05.2006 06:09:
> Disabling fsync() does improve performance. I did this once for a Pmacct 
> database, but was bitten horribly after the server was rebooted a couple 
> of times without being shutdown properly (power/UPS failure etc). So 
> don't do that.

You mean: all the data was gone, not only the recent one?

> - enable optimal indexes based on queries being used by pmacctd to 
> update the tables

I disabled updates at all. But anyway, with the default standard primary
key index over all the update fields there was no chance. Now I have
already over 18 million rows with these indexes:

Indexes:
    "acct_v5_idx" btree (ip_src, ip_dst, stamp_inserted)
    "stamp_inserted_idx" btree (stamp_inserted)

and all the queries are fast enough as long as a time window is selected,
which is not too big. But even adding a hash-table for ip_src was creating
trouble. So my problem is definitely correlated with "too much indexes".

> - remove unused indexes

I would add "or not so important indexes". ;-)

> - run all SELECT & UPDATE under "explain analyze ... " to see if they 
> are actually hitting the indexes

Good hint.

>>> BTW.: What is the best indexing to make a select with "WHERE
>>> stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21
>>> 21:17:55" fast? I guess a btree index, but maybe there's something better?
>>>     
>> This question is still open. No DBMS gurus here? ;-)
>>   
> Are you actually doing queries like this, or is this just an example?

Yes, I'm doing grouped sums in certain time slots, where I use queries
like this:
SELECT src_ip,sum(bytes) FROM acct_v5 GROUP BY src_ip ORDER BY sum(bytes)
DESC WHERE stamp_inserted>=2006-04-21 20:17:55 AND
stamp_inserted<2006-04-21 21:17:55";

> What you could do is an an index based on a group clause. Eg, group by 
> date_trunc('hour', stamp_inserted). Then it will be very fast to find 
> the records from the hour. Then a sequential scan for records in those 
> selected hours should be pretty quick.

Will it also intelligently use this index when doing "WHERE
stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21
21:17:55", so that it looks only in the two touched hours?

> I have counter tables with 7+ million records in Pg, and no complaints 
> at all from the people hitting web apps that are hitting them.

> Therefore I created an acct_monthly, 
> acct_daily, and acct_hourly tables. acct_monthly is blazing fast because 
> it has barely any records in it. acct_daily has much less records then 
> acct_hourly.

Sounds like the different resolution time lines that are used to be
created with RRDTool.


Cheers,

Sven

-- 
Sven Anderson
Institute for Informatics - http://www.ifi.informatik.uni-goettingen.de
Georg-August-Universitaet Goettingen
Lotzestr. 16-18, 37083 Goettingen, Germany

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to