Hi Wim,
Wim Kerkhoff, 04.05.2006 06:09:
> Disabling fsync() does improve performance. I did this once for a Pmacct
> database, but was bitten horribly after the server was rebooted a couple
> of times without being shutdown properly (power/UPS failure etc). So
> don't do that.
You mean: all the data was gone, not only the recent one?
> - enable optimal indexes based on queries being used by pmacctd to
> update the tables
I disabled updates at all. But anyway, with the default standard primary
key index over all the update fields there was no chance. Now I have
already over 18 million rows with these indexes:
Indexes:
"acct_v5_idx" btree (ip_src, ip_dst, stamp_inserted)
"stamp_inserted_idx" btree (stamp_inserted)
and all the queries are fast enough as long as a time window is selected,
which is not too big. But even adding a hash-table for ip_src was creating
trouble. So my problem is definitely correlated with "too much indexes".
> - remove unused indexes
I would add "or not so important indexes". ;-)
> - run all SELECT & UPDATE under "explain analyze ... " to see if they
> are actually hitting the indexes
Good hint.
>>> BTW.: What is the best indexing to make a select with "WHERE
>>> stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21
>>> 21:17:55" fast? I guess a btree index, but maybe there's something better?
>>>
>> This question is still open. No DBMS gurus here? ;-)
>>
> Are you actually doing queries like this, or is this just an example?
Yes, I'm doing grouped sums in certain time slots, where I use queries
like this:
SELECT src_ip,sum(bytes) FROM acct_v5 GROUP BY src_ip ORDER BY sum(bytes)
DESC WHERE stamp_inserted>=2006-04-21 20:17:55 AND
stamp_inserted<2006-04-21 21:17:55";
> What you could do is an an index based on a group clause. Eg, group by
> date_trunc('hour', stamp_inserted). Then it will be very fast to find
> the records from the hour. Then a sequential scan for records in those
> selected hours should be pretty quick.
Will it also intelligently use this index when doing "WHERE
stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21
21:17:55", so that it looks only in the two touched hours?
> I have counter tables with 7+ million records in Pg, and no complaints
> at all from the people hitting web apps that are hitting them.
> Therefore I created an acct_monthly,
> acct_daily, and acct_hourly tables. acct_monthly is blazing fast because
> it has barely any records in it. acct_daily has much less records then
> acct_hourly.
Sounds like the different resolution time lines that are used to be
created with RRDTool.
Cheers,
Sven
--
Sven Anderson
Institute for Informatics - http://www.ifi.informatik.uni-goettingen.de
Georg-August-Universitaet Goettingen
Lotzestr. 16-18, 37083 Goettingen, Germany
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists