Sven Anderson wrote:
> Hi all,
>
> Sven Anderson, 21.04.2006 21:34:
>   
>> I think the problem is not the updating of the data itself, but updating
>> the complex primary key. An index of (ip_src, ip_dst, stamp_inserted) is
>> fast enough to find entries, and easy enough to maintain.
>>     
>
> it seems to be known, that a default MySQL performs a lot better than a
> default PostgreSQL. One difference is, that PostgreSQL is doing an COMMIT
> after each INSERT/UPDATE, which results in an fsync() call, if fsync =
> true in postgresql.conf. See this article:
> http://monstera.man.poznan.pl/wiki/index.php/Mysql_vs_postgres
>   
Disabling fsync() does improve performance. I did this once for a Pmacct 
database, but was bitten horribly after the server was rebooted a couple 
of times without being shutdown properly (power/UPS failure etc). So 
don't do that.

When I was doing PostgreSQL optimization, I put some links on that topic 
in:
http://www.nyetwork.org/wiki/PostgreSQL

There's some simple things that can be done in PostgreSQL to really 
improve performance.

My main tips are:

- only keep columns in the pmacct tables that are actually being used
- check datatypes to see if they can be tightened up
- enable optimal indexes based on queries being used by pmacctd to 
update the tables
- enable optimal indexes based on queries being used by SQL reports (eg 
web pages) to select data
- remove unused indexes
- run all SELECT & UPDATE under "explain analyze ... " to see if they 
are actually hitting the indexes
- sometimes run "set enable_seqscan=no;" before a SELECT query can make 
a big difference
>> BTW.: What is the best indexing to make a select with "WHERE
>> stamp_inserted>=2006-04-21 20:17:55 AND stamp_inserted<2006-04-21
>> 21:17:55" fast? I guess a btree index, but maybe there's something better?
>>     
>
> This question is still open. No DBMS gurus here? ;-)
>   
Are you actually doing queries like this, or is this just an example?

What you could do is an an index based on a group clause. Eg, group by 
date_trunc('hour', stamp_inserted). Then it will be very fast to find 
the records from the hour. Then a sequential scan for records in those 
selected hours should be pretty quick.

I have counter tables with 7+ million records in Pg, and no complaints 
at all from the people hitting web apps that are hitting them.

Having one big table for everything of course makes queries easier. But 
often people are only doing hour-to-hour queries within the same day. 
And generally I found that most reports that people run are for 
per-month totals by IP address. Therefore I created an acct_monthly, 
acct_daily, and acct_hourly tables. acct_monthly is blazing fast because 
it has barely any records in it. acct_daily has much less records then 
acct_hourly. Although it does have more then acct_monthly, yet finding 
totals per day is fast since it has indexes that group based on that 
date_part() function's results. This method uses some extra disk space 
and causes some extra CPU load because there are three times as many 
pmacctd processes running, but overall it's worth it since queries are 
always fast, and pmacctd is never back logged trying frantically to 
commit...

Wim

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to