Re: [pmacct-discussion] reloading config accuracy

2009-09-25 Thread Wim Kerkhoff

On 9/25/2009 4:03 PM, Aaron Glenn wrote:

On Fri, Sep 25, 2009 at 4:59 AM, Tonytd_mi...@yahoo.com  wrote:
   

Is there a way to sort it properly by IP address (so that .2 comes after .1) in 
either an SQL query or in an XLS sheet ?

 

I hesitate to be 'that guy' but, you should look at using PostgreSQL.
I don't know enough about MySQL to make any suggestions specific to
it.

   


I totally agree; PostgreSQL handles network data types much better then 
MySQL. It will maintain proper index of the netdata data types.


You can do some sorting in MySQL based on IP:

http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_inet-aton

But this will not use any of the indexes I believe.

Wim

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] SQL Performance

2006-06-05 Thread Wim Kerkhoff
Peter Nixon wrote:
 As you can see the machine is not waiting on disk, but rather on the locks. 
 Before I go about trying to optimise this any further, is there something 
 basic I have wrong that can speed this up?
   
The configuration looks fine. It seems like there's no indexes or 
something, causing the UPDATEs to take forever.

It could also be that there are a lot of counters. Calculate 500 hosts, 
2 directions, 3 protocols (ICMP/TCP/UDP), server ports (10 ish), client 
ports (hundreds), and you could easily be trying to update hundreds of 
thousands of records every 10 minutes.

Turn on debug, redirect the output of nfacctd to a file, and watch to 
see how many updates are actually being done.

Wim

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] Putting inbound and outbound packets into same table

2006-06-01 Thread Wim Kerkhoff
Peter Nixon wrote:
 For my purposes the number of bytes for in and out could be summed (which 
 would fit with this system just fine but it could also be usefull to have 
 in_bytes and out_bytes in separate columns...

 Does anyone have any comments on this? Is there an easy way to do this 
 currently?
   
Here's a late reply on your message. I'm not exactly what your goal is, 
but I'm doing something similar by storing counters for each direction 
in the same PostgreSQL table. Here's the config and SQL queries. No 
extra columns are required, and yet it's easy to differentiate in vs 
out traffic.

Config is something like this:

aggregate[in]:dst_host
aggregate[out]:src_host
aggregate_filter[in]:  dst net 10.11.0.0/16 and not src net 10.11.0.0/16
aggregate_filter[out]: src net 10.11.0.0/16 and not dst net 10.11.0.0/16
plugins:pgsql[in], pgsql[out]
sql_data:frontend
sql_table:acct_monthly
sql_history:1M
sql_history_roundoff:h


Query to show summary by period (in this case, a month which is passed 
in via DBI parameters):

SELECT
coalesce(ip_dst, ip_src) as IP,
bytes2string(sum(bytes)) as Total MB ,
bytes2string(sum(bytes(ip_dst, bytes))) as MB In,
bytes2string(sum(bytes(ip_src, bytes))) as MB Out
 from acct_monthly
 WHERE
date_trunc('month', stamp_inserted)::timestamp = to_date(?,'Month 
')::timestamp
group by
coalesce(ip_dst, ip_src)
order by
coalesce(ip_dst, ip_src) asc;



-- custom SQL functions
create or replace function bytes2string(bigint) returns varchar as '
select to_char(round($1/1024/1024, 2), ''999,999,999,999.99\') as 
result;
' LANGUAGE SQL;

create or replace function bytes(inet, bigint) returns bigint as '
DECLARE
in_1 alias for $1;
in_2 alias for $2;
BEGIN
if in_1 is not null then
return in_2;
else
return 0;
end if;
END;
' LANGUAGE plpgsql;
-- end custom SQL functions


Wim

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] PostgreSQL performance

2006-05-03 Thread Wim Kerkhoff
Sven Anderson wrote:
 Hi all,

 Sven Anderson, 21.04.2006 21:34:
   
 I think the problem is not the updating of the data itself, but updating
 the complex primary key. An index of (ip_src, ip_dst, stamp_inserted) is
 fast enough to find entries, and easy enough to maintain.
 

 it seems to be known, that a default MySQL performs a lot better than a
 default PostgreSQL. One difference is, that PostgreSQL is doing an COMMIT
 after each INSERT/UPDATE, which results in an fsync() call, if fsync =
 true in postgresql.conf. See this article:
 http://monstera.man.poznan.pl/wiki/index.php/Mysql_vs_postgres
   
Disabling fsync() does improve performance. I did this once for a Pmacct 
database, but was bitten horribly after the server was rebooted a couple 
of times without being shutdown properly (power/UPS failure etc). So 
don't do that.

When I was doing PostgreSQL optimization, I put some links on that topic 
in:
http://www.nyetwork.org/wiki/PostgreSQL

There's some simple things that can be done in PostgreSQL to really 
improve performance.

My main tips are:

- only keep columns in the pmacct tables that are actually being used
- check datatypes to see if they can be tightened up
- enable optimal indexes based on queries being used by pmacctd to 
update the tables
- enable optimal indexes based on queries being used by SQL reports (eg 
web pages) to select data
- remove unused indexes
- run all SELECT  UPDATE under explain analyze ...  to see if they 
are actually hitting the indexes
- sometimes run set enable_seqscan=no; before a SELECT query can make 
a big difference
 BTW.: What is the best indexing to make a select with WHERE
 stamp_inserted=2006-04-21 20:17:55 AND stamp_inserted2006-04-21
 21:17:55 fast? I guess a btree index, but maybe there's something better?
 

 This question is still open. No DBMS gurus here? ;-)
   
Are you actually doing queries like this, or is this just an example?

What you could do is an an index based on a group clause. Eg, group by 
date_trunc('hour', stamp_inserted). Then it will be very fast to find 
the records from the hour. Then a sequential scan for records in those 
selected hours should be pretty quick.

I have counter tables with 7+ million records in Pg, and no complaints 
at all from the people hitting web apps that are hitting them.

Having one big table for everything of course makes queries easier. But 
often people are only doing hour-to-hour queries within the same day. 
And generally I found that most reports that people run are for 
per-month totals by IP address. Therefore I created an acct_monthly, 
acct_daily, and acct_hourly tables. acct_monthly is blazing fast because 
it has barely any records in it. acct_daily has much less records then 
acct_hourly. Although it does have more then acct_monthly, yet finding 
totals per day is fast since it has indexes that group based on that 
date_part() function's results. This method uses some extra disk space 
and causes some extra CPU load because there are three times as many 
pmacctd processes running, but overall it's worth it since queries are 
always fast, and pmacctd is never back logged trying frantically to 
commit...

Wim

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] discrepancy between stamp_inserted and stamp_updated ?

2006-02-02 Thread Wim Kerkhoff

Sebastien Guilbaud wrote:

I always have a difference of one hour in my database
between these columns :

   stamp_inserted   stamp_updated
2006-02-02 10:10:00 2006-02-02 09:16:26

I've tried to activate debugmode, and INSERT's contain
timestamps with one hour variation :

INSERT INTO acct_v2 (stamp_updated, stamp_inserted, ip_src, ip_dst, port_src, port_dst, ip_proto, mac_src, mac_dst, 
packets, bytes) VALUES (ABSTIME(1138868666)::Timestamp, ABSTIME(1138872000)::Timestamp, '0.0.0.0', 'x.x.x.x', 0, 0, 0, 
'0:0:0:0:0:0', '0:0:0:0:0:0', 2, 96)


I don't know whether this comes from nfacct or from a postgres
setting :-(( Is anybody using postgresql 8.1 to store nfacct
data ??
I ran into some similiar symptoms when I was using sql_history to create 
hourly counters without the roundoff parameter - counters had 
stamp_inserted and stamp_updated values in different hours rather then 
in the same hour. Adding sql_history_roundoff fixed that problem...


sql_history: 1h
sql_history_roundoff: h

Wim



[pmacct-discussion] gotcha with recovery log and multiple plugins

2005-10-05 Thread Wim Kerkhoff
I  just (painfully) ran into a situation where I had added multiple 
aggregation plugins to a configuration, for example:


! ...
aggregate[in]: dst_host
aggregate[out]: src_host
aggregate_filter[in]: dst net 192.168.1.0/24
aggregate_filter[out]: src net 192.168.1.0/24
plugins: pgsql[in], pgsql[out]
! ...
sql_recovery_logfile: /var/log/pmacct/recovery.dat
! ... EOF

But when trying to recover using pmpgplay, it would recover only records 
that match the template. So if the template header says that the data is 
for dst_host, then the data that has null ip_dst but has an ip_src would 
fail as you can't insert an empty IP address.


To prevent this in the future I've fixed my config to do:

sql_recovery_logfile[in]: /var/log/pmacct/recovery-in.dat
sql_recovery_logfile[out]: /var/log/pmacct/recovery-out.dat

I think this is a trap that other people might fall into...

Wim



[pmacct-discussion] agent_id not getting inserted

2005-06-04 Thread Wim Kerkhoff

Hi,

I think this is something trivial and I'm doing something obviously 
wrong, but thought I'd post as I haven't been able to figure it out yet...


My config file has (among other variables):

aggregate[in]:dst_host
aggregate[out]:src_host
sql_optimize_clauses: true
sql_history: 1h
sql_table_version: 2
pmacctd_id: 6

But yet (even with 0.8.6), the agent_id=6 is not being inserted. Works 
fine on another installation with pretty much identical configuration file.


Wim



Re: [pmacct-discussion] pmacctd only collecting half of the data being passed.

2005-02-25 Thread Wim Kerkhoff

David Maple wrote:


Hi all,

I have pmacctd running on a dedicated machine listening on a gigabit
Ethernet port.  It is only collecting half of the data that the router
is passing to it.  For example, at this time, the 5 minute average from
the router (confirmed with ifconfig) is 136,087,000 bits/sec and pmacctd
is reporting only 70,212,327 bits/sec.  
	All I'm trying to do is produce 5 minute usage reports by IP address

for both source and destination traffic.  When I have
sql_dont_try_update disabled, it brings the CPU on the MySQL server to
99%, so I have it enabled until I can put in a stronger MySQL server.  I
have done this with and without sql_optimize_clauses with no difference.
Has anyone else gotten this working with this volume of traffic?  If
so, I would appreciate knowing what you did to get it working.
 

It's probably worthwhile to investigate whether the updates/inserts are 
actually using the table indexes. Turn on debug in MySQL or pmacctd to 
get the query that's being run, then run it with EXPLAIN ANALYZE update 
...  in the mysql command line tool.


On my PostgreSQL database I have multiple indexes to handle the updates 
plus the queries for the usage reporting.


I found that aggregating by IP src/dst pairs created a crazy amount of 
data. So we're running two aggregates/filters - one to look at inbound 
traffic by dest IP, and another for outgoing traffic by source IP. Basic 
theory is that I don't care about where the traffic is going to on the 
Internet, just who's responsible for it.


Wim