Hi Paolo,

Thank you very much for your reply. I've been testing out your suggestions and 
documenting the results, but in the process I found something seriously wrong 
that may make these results irrelevant, so I decided to ask for help with 
regards to this before spending more time on testing.

My main problem was that the process of entering flow data into MySQL using 
nfacctd took a long time - sometimes the process did not complete purging the 
buffer before the next buffer purge process was started, thus filling up memory 
with instances of nfacctd and mysql. Nfacctd/MySQL is running on a dedicated 
(otherwise idle) quad CPU, 3GHz Xeon with 2GB RAM and the number of buffer 
entries purged each 5 minutes are around 1500, so we're talking an overpowered 
beast dealing with peanuts.

That nfacctd spends 3-6 minutes putting these values into MySQL doesn't seem 
right. Looking at the time taken, the UPDATE queries accounts for >95% of this 
(though 40% of all queries are INSERTs). INSERTs are very fast - several 
hundred queries per second (maybe thousands), while UPDATEs run at ~8-10 
queries per second, decreasing with table size. At first I checked the myisam 
indexes (no change) and even converted to InnoDB (which just made it run 
slower).

The UPDATE queries made by nfacctd (without L2 information) look like this (I 
modified the plugin to give a query-dump):

....
UPDATE acct_out SET packets=packets+4466, bytes=bytes+5404822, 
stamp_updated=now() WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND 
ip_src='81.93.162.19'
 AND ip_dst=2116 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+1734, bytes=bytes+2289876, 
stamp_updated=now() WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND 
ip_src='81.93.161.244
' AND ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+92, bytes=bytes+70564, stamp_updated=now() 
WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND ip_src='81.93.162.132' AN
D ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
UPDATE acct_out SET packets=packets+10, bytes=bytes+3682, stamp_updated=now() 
WHERE FROM_UNIXTIME(1133803500) = stamp_inserted AND ip_src='81.93.161.42' AND
ip_dst=15659 AND src_port=80 AND dst_port=0 AND ip_proto='ip';
......

I checked the table keys for the two tables to make sure each sub-clause in the 
WHERE-clause correspondes to the primary table keys:

mysql> describe acct_in;
+----------------+---------------------+------+-----+---------------------+-------+
| Field          | Type                | Null | Key | Default             | 
Extra |
+----------------+---------------------+------+-----+---------------------+-------+
| ip_src         | char(15)            |      | PRI |                     |     
  |
| ip_dst         | char(15)            |      | PRI |                     |     
  |
| src_port       | int(2) unsigned     |      | PRI | 0                   |     
  |
| dst_port       | int(2) unsigned     |      | PRI | 0                   |     
  |
| ip_proto       | char(6)             |      | PRI |                     |     
  |
| packets        | int(10) unsigned    |      |     | 0                   |     
  |
| bytes          | bigint(20) unsigned |      |     | 0                   |     
  |
| stamp_inserted | datetime            |      | PRI | 0000-00-00 00:00:00 |     
  |
| stamp_updated  | datetime            | YES  |     | NULL                |     
  |
+----------------+---------------------+------+-----+---------------------+-------+
9 rows in set (0.00 sec)

mysql> describe acct_out;
+----------------+---------------------+------+-----+---------------------+-------+
| Field          | Type                | Null | Key | Default             | 
Extra |
+----------------+---------------------+------+-----+---------------------+-------+
| ip_src         | char(15)            |      | PRI |                     |     
  |
| ip_dst         | char(15)            |      | PRI |                     |     
  |
| src_port       | int(2) unsigned     |      | PRI | 0                   |     
  |
| dst_port       | int(2) unsigned     |      | PRI | 0                   |     
  |
| ip_proto       | char(6)             |      | PRI |                     |     
  |
| packets        | int(10) unsigned    |      |     | 0                   |     
  |
| bytes          | bigint(20) unsigned |      |     | 0                   |     
  |
| stamp_inserted | datetime            |      | PRI | 0000-00-00 00:00:00 |     
  |
| stamp_updated  | datetime            | YES  |     | NULL                |     
  |
+----------------+---------------------+------+-----+---------------------+-------+
9 rows in set (0.00 sec)



An EXPLAIN on a SELECT query with the same WHERE clause as nfaccd UPDATE 
queries, gives the following.

mysql> explain SELECT count(*) from acct_in WHERE FROM_UNIXTIME(1133803500) = 
stamp_inserted AND ip_dst='81.93.162.235' AND ip_src=3307 AND src_port=0 AND 
dst_port=0 AND ip_proto='ip';
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  
| rows    | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | SIMPLE      | acct_in | index | PRIMARY       | PRIMARY |      52 | NULL 
| 2571703 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)


Most of this looks good: the type, possible_keys and Extra fields all make 
sense as far as I can see, but the "rows" field looks all wrong. It contains 
the number of rows MySQL expects it has to investigate to complete the query. 
It's supposed to be 1 as the query is fully indexed, but it is far from it. 

The bottom line is: (i) UPDATE-queries on the pmacct tables take forever and 
(ii) EXPLAIN information on these queries yields a very high "row" count.

Looking my queries, you can see that I use AS aggregation on ip_dst on outgoing 
data and ip_src on incoming. Looking at the queries generated by 
mysql_plugin.c, IP-addresses are entered as quoted strings, something that 
matches the column type, while AS numbers are sent unquoted. This can be seen 
from mysql_plugin.c (see the two last strncat() statements):

  if (what_to_count & (COUNT_SRC_AS|COUNT_SUM_AS)) {
    if (primitive) {
      strncat(insert_clause, ", ", SPACELEFT(insert_clause));
      strncat(values[primitive].string, ", ", sizeof(values[primitive].string));
      strncat(where[primitive].string, " AND ", 
sizeof(where[primitive].string));
    }
    strncat(insert_clause, "ip_src", SPACELEFT(insert_clause));
    strncat(values[primitive].string, "%u", 
SPACELEFT(values[primitive].string));
    strncat(where[primitive].string, "ip_src=%u", 
SPACELEFT(where[primitive].string));
    values[primitive].type = where[primitive].type = COUNT_SRC_AS;
    values[primitive].handler = where[primitive].handler = count_src_as_handler;
    primitive++;
  }


Sending a modified query where AS numbers are quoted improves performance 
significantly (to expected levels) and EXPLAIN gives a proper row count:

mysql> explain SELECT count(*) from acct_in WHERE FROM_UNIXTIME(1133803500) = 
stamp_inserted AND ip_dst='81.93.162.235' AND ip_src='3307' AND src_port=0 AND 
dst_port=0 AND ip_proto='ip';
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  
                               | rows | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
|  1 | SIMPLE      | acct_in | const | PRIMARY       | PRIMARY |      52 | 
const,const,const,const,const,const |    1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------------------------------------+------+-------------+
1 row in set (0.00 sec)

Is there any special reason for not quoting AS-numbers? Is it a bug or have I 
missed some information about changing the ip_src/ip_dst columns to be integers 
in tables where they will contain AS numbers (it would reduce table size and go 
a bit faster I guess, but at the cost of uniformity)?

all the best,

-- Inge

Reply via email to