Re: [pmacct-discussion] reloading config accuracy
On 9/25/2009 4:03 PM, Aaron Glenn wrote: On Fri, Sep 25, 2009 at 4:59 AM, Tonytd_mi...@yahoo.com wrote: Is there a way to sort it properly by IP address (so that .2 comes after .1) in either an SQL query or in an XLS sheet ? I hesitate to be 'that guy' but, you should look at using PostgreSQL. I don't know enough about MySQL to make any suggestions specific to it. I totally agree; PostgreSQL handles network data types much better then MySQL. It will maintain proper index of the netdata data types. You can do some sorting in MySQL based on IP: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_inet-aton But this will not use any of the indexes I believe. Wim ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] SQL Performance
Peter Nixon wrote: As you can see the machine is not waiting on disk, but rather on the locks. Before I go about trying to optimise this any further, is there something basic I have wrong that can speed this up? The configuration looks fine. It seems like there's no indexes or something, causing the UPDATEs to take forever. It could also be that there are a lot of counters. Calculate 500 hosts, 2 directions, 3 protocols (ICMP/TCP/UDP), server ports (10 ish), client ports (hundreds), and you could easily be trying to update hundreds of thousands of records every 10 minutes. Turn on debug, redirect the output of nfacctd to a file, and watch to see how many updates are actually being done. Wim ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] Putting inbound and outbound packets into same table
Peter Nixon wrote: For my purposes the number of bytes for in and out could be summed (which would fit with this system just fine but it could also be usefull to have in_bytes and out_bytes in separate columns... Does anyone have any comments on this? Is there an easy way to do this currently? Here's a late reply on your message. I'm not exactly what your goal is, but I'm doing something similar by storing counters for each direction in the same PostgreSQL table. Here's the config and SQL queries. No extra columns are required, and yet it's easy to differentiate in vs out traffic. Config is something like this: aggregate[in]:dst_host aggregate[out]:src_host aggregate_filter[in]: dst net 10.11.0.0/16 and not src net 10.11.0.0/16 aggregate_filter[out]: src net 10.11.0.0/16 and not dst net 10.11.0.0/16 plugins:pgsql[in], pgsql[out] sql_data:frontend sql_table:acct_monthly sql_history:1M sql_history_roundoff:h Query to show summary by period (in this case, a month which is passed in via DBI parameters): SELECT coalesce(ip_dst, ip_src) as IP, bytes2string(sum(bytes)) as Total MB , bytes2string(sum(bytes(ip_dst, bytes))) as MB In, bytes2string(sum(bytes(ip_src, bytes))) as MB Out from acct_monthly WHERE date_trunc('month', stamp_inserted)::timestamp = to_date(?,'Month ')::timestamp group by coalesce(ip_dst, ip_src) order by coalesce(ip_dst, ip_src) asc; -- custom SQL functions create or replace function bytes2string(bigint) returns varchar as ' select to_char(round($1/1024/1024, 2), ''999,999,999,999.99\') as result; ' LANGUAGE SQL; create or replace function bytes(inet, bigint) returns bigint as ' DECLARE in_1 alias for $1; in_2 alias for $2; BEGIN if in_1 is not null then return in_2; else return 0; end if; END; ' LANGUAGE plpgsql; -- end custom SQL functions Wim ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] PostgreSQL performance
Sven Anderson wrote: Hi all, Sven Anderson, 21.04.2006 21:34: I think the problem is not the updating of the data itself, but updating the complex primary key. An index of (ip_src, ip_dst, stamp_inserted) is fast enough to find entries, and easy enough to maintain. it seems to be known, that a default MySQL performs a lot better than a default PostgreSQL. One difference is, that PostgreSQL is doing an COMMIT after each INSERT/UPDATE, which results in an fsync() call, if fsync = true in postgresql.conf. See this article: http://monstera.man.poznan.pl/wiki/index.php/Mysql_vs_postgres Disabling fsync() does improve performance. I did this once for a Pmacct database, but was bitten horribly after the server was rebooted a couple of times without being shutdown properly (power/UPS failure etc). So don't do that. When I was doing PostgreSQL optimization, I put some links on that topic in: http://www.nyetwork.org/wiki/PostgreSQL There's some simple things that can be done in PostgreSQL to really improve performance. My main tips are: - only keep columns in the pmacct tables that are actually being used - check datatypes to see if they can be tightened up - enable optimal indexes based on queries being used by pmacctd to update the tables - enable optimal indexes based on queries being used by SQL reports (eg web pages) to select data - remove unused indexes - run all SELECT UPDATE under explain analyze ... to see if they are actually hitting the indexes - sometimes run set enable_seqscan=no; before a SELECT query can make a big difference BTW.: What is the best indexing to make a select with WHERE stamp_inserted=2006-04-21 20:17:55 AND stamp_inserted2006-04-21 21:17:55 fast? I guess a btree index, but maybe there's something better? This question is still open. No DBMS gurus here? ;-) Are you actually doing queries like this, or is this just an example? What you could do is an an index based on a group clause. Eg, group by date_trunc('hour', stamp_inserted). Then it will be very fast to find the records from the hour. Then a sequential scan for records in those selected hours should be pretty quick. I have counter tables with 7+ million records in Pg, and no complaints at all from the people hitting web apps that are hitting them. Having one big table for everything of course makes queries easier. But often people are only doing hour-to-hour queries within the same day. And generally I found that most reports that people run are for per-month totals by IP address. Therefore I created an acct_monthly, acct_daily, and acct_hourly tables. acct_monthly is blazing fast because it has barely any records in it. acct_daily has much less records then acct_hourly. Although it does have more then acct_monthly, yet finding totals per day is fast since it has indexes that group based on that date_part() function's results. This method uses some extra disk space and causes some extra CPU load because there are three times as many pmacctd processes running, but overall it's worth it since queries are always fast, and pmacctd is never back logged trying frantically to commit... Wim ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] discrepancy between stamp_inserted and stamp_updated ?
Sebastien Guilbaud wrote: I always have a difference of one hour in my database between these columns : stamp_inserted stamp_updated 2006-02-02 10:10:00 2006-02-02 09:16:26 I've tried to activate debugmode, and INSERT's contain timestamps with one hour variation : INSERT INTO acct_v2 (stamp_updated, stamp_inserted, ip_src, ip_dst, port_src, port_dst, ip_proto, mac_src, mac_dst, packets, bytes) VALUES (ABSTIME(1138868666)::Timestamp, ABSTIME(1138872000)::Timestamp, '0.0.0.0', 'x.x.x.x', 0, 0, 0, '0:0:0:0:0:0', '0:0:0:0:0:0', 2, 96) I don't know whether this comes from nfacct or from a postgres setting :-(( Is anybody using postgresql 8.1 to store nfacct data ?? I ran into some similiar symptoms when I was using sql_history to create hourly counters without the roundoff parameter - counters had stamp_inserted and stamp_updated values in different hours rather then in the same hour. Adding sql_history_roundoff fixed that problem... sql_history: 1h sql_history_roundoff: h Wim
[pmacct-discussion] gotcha with recovery log and multiple plugins
I just (painfully) ran into a situation where I had added multiple aggregation plugins to a configuration, for example: ! ... aggregate[in]: dst_host aggregate[out]: src_host aggregate_filter[in]: dst net 192.168.1.0/24 aggregate_filter[out]: src net 192.168.1.0/24 plugins: pgsql[in], pgsql[out] ! ... sql_recovery_logfile: /var/log/pmacct/recovery.dat ! ... EOF But when trying to recover using pmpgplay, it would recover only records that match the template. So if the template header says that the data is for dst_host, then the data that has null ip_dst but has an ip_src would fail as you can't insert an empty IP address. To prevent this in the future I've fixed my config to do: sql_recovery_logfile[in]: /var/log/pmacct/recovery-in.dat sql_recovery_logfile[out]: /var/log/pmacct/recovery-out.dat I think this is a trap that other people might fall into... Wim
[pmacct-discussion] agent_id not getting inserted
Hi, I think this is something trivial and I'm doing something obviously wrong, but thought I'd post as I haven't been able to figure it out yet... My config file has (among other variables): aggregate[in]:dst_host aggregate[out]:src_host sql_optimize_clauses: true sql_history: 1h sql_table_version: 2 pmacctd_id: 6 But yet (even with 0.8.6), the agent_id=6 is not being inserted. Works fine on another installation with pretty much identical configuration file. Wim
Re: [pmacct-discussion] pmacctd only collecting half of the data being passed.
David Maple wrote: Hi all, I have pmacctd running on a dedicated machine listening on a gigabit Ethernet port. It is only collecting half of the data that the router is passing to it. For example, at this time, the 5 minute average from the router (confirmed with ifconfig) is 136,087,000 bits/sec and pmacctd is reporting only 70,212,327 bits/sec. All I'm trying to do is produce 5 minute usage reports by IP address for both source and destination traffic. When I have sql_dont_try_update disabled, it brings the CPU on the MySQL server to 99%, so I have it enabled until I can put in a stronger MySQL server. I have done this with and without sql_optimize_clauses with no difference. Has anyone else gotten this working with this volume of traffic? If so, I would appreciate knowing what you did to get it working. It's probably worthwhile to investigate whether the updates/inserts are actually using the table indexes. Turn on debug in MySQL or pmacctd to get the query that's being run, then run it with EXPLAIN ANALYZE update ... in the mysql command line tool. On my PostgreSQL database I have multiple indexes to handle the updates plus the queries for the usage reporting. I found that aggregating by IP src/dst pairs created a crazy amount of data. So we're running two aggregates/filters - one to look at inbound traffic by dest IP, and another for outgoing traffic by source IP. Basic theory is that I don't care about where the traffic is going to on the Internet, just who's responsible for it. Wim