Sven Anderson wrote: > Hi all, > > Sven Anderson, 21.04.2006 21:34: > >> I think the problem is not the updating of the data itself, but updating >> the complex primary key. An index of (ip_src, ip_dst, stamp_inserted) is >> fast enough to find entries, and easy enough to maintain. >> > > it seems to be known, that a default MySQL performs a lot better than a > default PostgreSQL. One difference is, that PostgreSQL is doing an COMMIT > after each INSERT/UPDATE, which results in an fsync() call, if fsync = > true in postgresql.conf. See this article: > http://monstera.man.poznan.pl/wiki/index.php/Mysql_vs_postgres > Disabling fsync() does improve performance. I did this once for a Pmacct database, but was bitten horribly after the server was rebooted a couple of times without being shutdown properly (power/UPS failure etc). So don't do that.
When I was doing PostgreSQL optimization, I put some links on that topic in: http://www.nyetwork.org/wiki/PostgreSQL There's some simple things that can be done in PostgreSQL to really improve performance. My main tips are: - only keep columns in the pmacct tables that are actually being used - check datatypes to see if they can be tightened up - enable optimal indexes based on queries being used by pmacctd to update the tables - enable optimal indexes based on queries being used by SQL reports (eg web pages) to select data - remove unused indexes - run all SELECT & UPDATE under "explain analyze ... " to see if they are actually hitting the indexes - sometimes run "set enable_seqscan=no;" before a SELECT query can make a big difference >> BTW.: What is the best indexing to make a select with "WHERE >> stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21 >> 21:17:55" fast? I guess a btree index, but maybe there's something better? >> > > This question is still open. No DBMS gurus here? ;-) > Are you actually doing queries like this, or is this just an example? What you could do is an an index based on a group clause. Eg, group by date_trunc('hour', stamp_inserted). Then it will be very fast to find the records from the hour. Then a sequential scan for records in those selected hours should be pretty quick. I have counter tables with 7+ million records in Pg, and no complaints at all from the people hitting web apps that are hitting them. Having one big table for everything of course makes queries easier. But often people are only doing hour-to-hour queries within the same day. And generally I found that most reports that people run are for per-month totals by IP address. Therefore I created an acct_monthly, acct_daily, and acct_hourly tables. acct_monthly is blazing fast because it has barely any records in it. acct_daily has much less records then acct_hourly. Although it does have more then acct_monthly, yet finding totals per day is fast since it has indexes that group based on that date_part() function's results. This method uses some extra disk space and causes some extra CPU load because there are three times as many pmacctd processes running, but overall it's worth it since queries are always fast, and pmacctd is never back logged trying frantically to commit... Wim _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
