Hi Paolo,
Thank you very much for your reply. I've been testing out your suggestions and
documenting the results, but in the process I found something seriously wrong
that may make these results irrelevant, so I decided to ask for help with
regards to this before spending more time on testing.
My main problem was that the process of entering flow data into MySQL using
nfacctd took a long time - sometimes the process did not complete purging the
buffer before the next buffer purge process was started, thus filling up memory
with instances of nfacctd and mysql. Nfacctd/MySQL is running on a dedicated
(otherwise idle) quad CPU, 3GHz Xeon with 2GB RAM and the number of buffer
entries purged each 5 minutes are around 1500, so we're talking an overpowered
beast dealing with peanuts.
That nfacctd spends 3-6 minutes putting these values into MySQL doesn't seem
right. Looking at the time taken, the UPDATE queries accounts for >95% of this
(though 40% of all queries are INSERTs). INSERTs are very fast - several
hundred queries per second (maybe thousands), while UPDATEs run at ~8-10
queries per second, decreasing with table size. At first I checked the myisam
indexes (no change) and even converted to InnoDB (which just made it run
slower).
The UPDATE queries made by nfacctd (without L2 information) look like this (I
modified the plugin to give a query-dump):
....
UPDATE acct_out SET packets=packets+4466, bytes=bytes+5404822,
stamp_updated=now() WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND
ip_src='81.93.162.19'
AND ip_dst=2116 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+1734, bytes=bytes+2289876,
stamp_updated=now() WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND
ip_src='81.93.161.244
' AND ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+92, bytes=bytes+70564, stamp_updated=now()
WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND ip_src='81.93.162.132' AN
D ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+10, bytes=bytes+3682, stamp_updated=now()
WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND ip_src='81.93.161.42' AND
ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
......
I checked the table keys for the two tables to make sure each sub-clause in the
WHERE-clause correspondes to the primary table keys:
mysql> describe acct_in;
+----------------+---------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default |
Extra |
+----------------+---------------------+------+-----+---------------------+-------+
| ip_src | char(15) | | PRI | |
|
| ip_dst | char(15) | | PRI | |
|
| src_port | int(2) unsigned | | PRI | 0 |
|
| dst_port | int(2) unsigned | | PRI | 0 |
|
| ip_proto | char(6) | | PRI | |
|
| packets | int(10) unsigned | | | 0 |
|
| bytes | bigint(20) unsigned | | | 0 |
|
| stamp_inserted | datetime | | PRI | 0000-00-00 00:00:00 |
|
| stamp_updated | datetime | YES | | NULL |
|
+----------------+---------------------+------+-----+---------------------+-------+
9 rows in set (0.00 sec)
mysql> describe acct_out;
+----------------+---------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default |
Extra |
+----------------+---------------------+------+-----+---------------------+-------+
| ip_src | char(15) | | PRI | |
|
| ip_dst | char(15) | | PRI | |
|
| src_port | int(2) unsigned | | PRI | 0 |
|
| dst_port | int(2) unsigned | | PRI | 0 |
|
| ip_proto | char(6) | | PRI | |
|
| packets | int(10) unsigned | | | 0 |
|
| bytes | bigint(20) unsigned | | | 0 |
|
| stamp_inserted | datetime | | PRI | 0000-00-00 00:00:00 |
|
| stamp_updated | datetime | YES | | NULL |
|
+----------------+---------------------+------+-----+---------------------+-------+
9 rows in set (0.00 sec)
An EXPLAIN on a SELECT query with the same WHERE clause as nfaccd UPDATE
queries, gives the following.
mysql> explain SELECT count(*) from acct_in WHERE FROM_UNIXTIME(1133803500) =
stamp_inserted AND ip_dst='81.93.162.235' AND ip_src=3307 AND src_port=0 AND
dst_port=0 AND ip_proto='ip';
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref
| rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | SIMPLE | acct_in | index | PRIMARY | PRIMARY | 52 | NULL
| 2571703 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
Most of this looks good: the type, possible_keys and Extra fields all make
sense as far as I can see, but the "rows" field looks all wrong. It contains
the number of rows MySQL expects it has to investigate to complete the query.
It's supposed to be 1 as the query is fully indexed, but it is far from it.
The bottom line is: (i) UPDATE-queries on the pmacct tables take forever and
(ii) EXPLAIN information on these queries yields a very high "row" count.
Looking my queries, you can see that I use AS aggregation on ip_dst on outgoing
data and ip_src on incoming. Looking at the queries generated by
mysql_plugin.c, IP-addresses are entered as quoted strings, something that
matches the column type, while AS numbers are sent unquoted. This can be seen
from mysql_plugin.c (see the two last strncat() statements):
if (what_to_count & (COUNT_SRC_AS|COUNT_SUM_AS)) {
if (primitive) {
strncat(insert_clause, ", ", SPACELEFT(insert_clause));
strncat(values[primitive].string, ", ", sizeof(values[primitive].string));
strncat(where[primitive].string, " AND ",
sizeof(where[primitive].string));
}
strncat(insert_clause, "ip_src", SPACELEFT(insert_clause));
strncat(values[primitive].string, "%u",
SPACELEFT(values[primitive].string));
strncat(where[primitive].string, "ip_src=%u",
SPACELEFT(where[primitive].string));
values[primitive].type = where[primitive].type = COUNT_SRC_AS;
values[primitive].handler = where[primitive].handler = count_src_as_handler;
primitive++;
}
Sending a modified query where AS numbers are quoted improves performance
significantly (to expected levels) and EXPLAIN gives a proper row count:
mysql> explain SELECT count(*) from acct_in WHERE FROM_UNIXTIME(1133803500) =
stamp_inserted AND ip_dst='81.93.162.235' AND ip_src='3307' AND src_port=0 AND
dst_port=0 AND ip_proto='ip';
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref
| rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
| 1 | SIMPLE | acct_in | const | PRIMARY | PRIMARY | 52 |
const,const,const,const,const,const | 1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
1 row in set (0.00 sec)
Is there any special reason for not quoting AS-numbers? Is it a bug or have I
missed some information about changing the ip_src/ip_dst columns to be integers
in tables where they will contain AS numbers (it would reduce table size and go
a bit faster I guess, but at the cost of uniformity)?
all the best,
-- Inge