Re: [pmacct-discussion] nfacct total bytes inconsistencies
Hello Mario, Yes they include everything AFAIK, but we don't have any multicast traffic and the the broadcast traffic is very little on our VLANs (mostly standard LAMP servers). Plus the uplink interfaces (which I am monitoring/exporting flows for) do not handle any broadcast traffic (except the standard arp packets of course). On 30/11/2015 12:14 μμ, Jentsch, Mario wrote: > Hi Vaggelis, > > do the SNMP OIDs are you monitoring for these traffic numbers include packets > that are not exported via Netflow (broadcast, multicast etc)? > > Regards, > Mario > ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] nfacct total bytes inconsistencies
Hello Paolo, I guess I was wrong about the numbers not being off too much. I had to wait for more data to be collected. As time passes the total bytes accounted are getting way off. What would be the maximum accepted discrepancy in an ideal setup? I know that there will be differences between SNMP measurements, but how much difference is considered normal? (I know it's kind of a vague question) Restarting nfacctd did not change anything. I also restarted the whole box just in case. The UDP drop counters still stay unaffected Udp: 78879 packets received 590 packets to unknown port received. 0 packet receive errors 25 packets sent sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 101: :A1F1 : 07 : 00: 00 10613 2 88013a354780 0 575: 017F:2BCB : 07 : 00: 1100 10201 2 8800bacfdac0 0 1720: :0044 : 07 : 00: 00 10630 2 88013a354400 0 Regarding the VLAN traffic, I do have VLAN traffic, but Mirkotik does not export this field as far as I can tell from the netflow template. DEBUG ( default/core ): NfV9 agent : X.X.X.X:0 DEBUG ( default/core ): NfV9 template type : flow DEBUG ( default/core ): NfV9 template ID : 257 DEBUG ( default/core ): - DEBUG ( default/core ): |pen | field type | offset | size | DEBUG ( default/core ): | 0 | ip version | 0 | 1 | DEBUG ( default/core ): | 0 | IPv6 src addr | 1 | 16 | DEBUG ( default/core ): | 0 | IPv6 src mask | 17 | 1 | DEBUG ( default/core ): | 0 | input snmp | 18 | 4 | DEBUG ( default/core ): | 0 | IPv6 dst addr | 22 | 16 | DEBUG ( default/core ): | 0 | IPv6 dst mask | 38 | 1 | DEBUG ( default/core ): | 0 | output snmp| 39 | 4 | DEBUG ( default/core ): | 0 | IPv6 next hop | 43 | 16 | DEBUG ( default/core ): | 0 | L4 protocol| 59 | 1 | DEBUG ( default/core ): | 0 | tcp flags | 60 | 1 | DEBUG ( default/core ): | 0 | tos| 61 | 1 | DEBUG ( default/core ): | 0 | L4 src port| 62 | 2 | DEBUG ( default/core ): | 0 | L4 dst port| 64 | 2 | DEBUG ( default/core ): | 0 | 31 | 66 | 4 | DEBUG ( default/core ): | 0 | 64 | 70 | 4 | DEBUG ( default/core ): | 0 | last switched | 74 | 4 | DEBUG ( default/core ): | 0 | first switched | 78 | 4 | DEBUG ( default/core ): | 0 | in bytes | 82 | 4 | DEBUG ( default/core ): | 0 | in packets | 86 | 4 | DEBUG ( default/core ): | 0 | in dst mac | 90 | 6 | DEBUG ( default/core ): | 0 | out src mac| 96 | 6 | DEBUG ( default/core ): - DEBUG ( default/core ): Netflow V9/IPFIX record size : 102 DEBUG ( default/core ): What drives me crazy is that if I do controlled data transfers for long periods of time, nfacctd counts everything properly. I can see the rate at which the bytes counter increases in the database, with which doing the calculations results in exactly the mbit/s I am doing transfers at. So it seems that RouterOS does export the flows properly and nfacctd does measure the bytes properly. And yet, when checking the results on another IP (which has normal web traffic) then the data are always off and getting worse as time goes by. What's even stranger is that the Download bytes (which is always less in reality) is measured slightly higher in nfacctd (from a few MB to a few hundred MB). While upload data is measured less than what is actually going through the wire. (from 1GB to 3-4GB less per hour, depending on how much traffic the server has at any given hour) Unfortunately the collector box is not accessible from the internet. I understand that this would help you identify the issue much quicker than explaining to me every possible solution to try. I'll try to get permission to allow you access (via VPN or something) if nothing else works. I really do appreciate the offer to help! :) I noticed today some sporadic info messages on nfacctd output. INFO: expecting flow '657566' but received '657621' collector=0.0.0.0:2055 agent=X.X.X.X1:0 INFO: expecting flow '657677' but received '657738' collector=0.0.0.0:2055 agent=X.X.X.X:0 Is this normal? Does that mean that it lost a flow somewhere and that's why it throws this INFO message? I have increased the buffers: plugin_pipe_size: 268435456 plugin_buffer_size: 268435 nfacctd_pipe_size:
Re: [pmacct-discussion] nfacct total bytes inconsistencies
Hi Vaggelis, do the SNMP OIDs are you monitoring for these traffic numbers include packets that are not exported via Netflow (broadcast, multicast etc)? Regards, Mario > -Original Message- > From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net] > On Behalf Of Vaggelis Koutroumpas > Sent: Sunday, November 29, 2015 12:23 AM > To: Paolo Lucente; pmacct-discussion@pmacct.net > Subject: Re: [pmacct-discussion] nfacct total bytes inconsistencies > > It seems that the new server shows the same behavior after all :( > > > mysql> SELECT ( > -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') > as bytesFROM hourlyWHERE ip_dst = '0.0.0.0' AND stamp_inserted > BETWEEN '2015-11-28 20:00:00' AND '2015-11-28 23:59:59' > -> ) as total_out, ( > -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') > as bytesFROM hourlyWHERE ip_src = '0.0.0.0' AND stamp_inserted > BETWEEN '2015-11-28 20:00:00' AND '2015-11-28 23:59:59' > -> ) as total_in; > +---+--+ > | total_out | total_in | > +---+--+ > | 101.03GB | 15.43GB | > +---+--+ > 1 row in set (0.05 sec) > > While at the same time-frame observium reports higher 'total out' and > less 'total in' http://prntscr.com/983ers > > I guess the 'total in' discrepancy is acceptable. But the 'total out' is > over 6Gbytes off! > > If I increase the time-frame then the totals are more off. > > mysql> SELECT ( > -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') > as bytesFROM hourlyWHERE ip_dst = '0.0.0.0' AND stamp_inserted > BETWEEN '2015-11-28 19:00:00' AND '2015-11-28 23:59:59' > -> ) as total_out, ( > -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB') > as bytesFROM hourlyWHERE ip_src = '0.0.0.0' AND stamp_inserted > BETWEEN '2015-11-28 19:00:00' AND '2015-11-28 23:59:59' > -> ) as total_in; > +---+--+ > | total_out | total_in | > +---+--+ > | 129.60GB | 19.46GB | > +---+--+ > 1 row in set (0.02 sec) > > Observium: http://prntscr.com/983nxa > > Here the 'total out' is 8GBytes off. > While 'total in' seems to be a little off but in acceptable range. > > > There are no drops AFAICT. > > root@netflow:~# netstat -s | grep Udp\: -A 5 > Udp: > 817211 packets received > 688 packets to unknown port received. > 122 packet receive errors > 14971 packets sent > RcvbufErrors: 122 > > Those 122 errors are there for hours (before 20:00:00 of my query). > > root@netflow:~# cat /proc/net/udp > sl local_address rem_address st tx_queue rx_queue tr tm->when > retrnsmt uid timeout inode ref pointer drops > 696: :0044 : 07 : 00: > 00 10611 2 88007b36c780 0 > 751: :307B : 07 : 00: > 00 10580 2 88007b36cb00 0 > > > I've also installed munin to monitor the performance of the server. > MySQL does on average 40 queries/s. > The server load is steadily 0.1 > The avg incoming packets are ~40pps > > So the server is pretty much idle to lose any data. > > Any ideas what else to check? > What would be an acceptable 'off percentage' of the bytes in comparison > with SNMP measurements? > > > Thanks. > > ___ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] nfacct total bytes inconsistencies
On 28.11.2015 21:22, Vaggelis Koutroumpas wrote: Now, checking the udp drop counters on the old server, indeed I see some 25000+ drops. That counter seem to increase during the refresh time of the sql plugin. Not always though. Is there a connection between the drops and the mysql insert/update process? If so, would running the mysql server on a different server eliminate any future possibility of that happening again? Get those fixed (if they increase while nfacct is running and if you can't ensure, that the drops are caused by any other UDP packets hitting the box). Increase max UDP receive buffers on the box and set nfacctd_pipe_size. Make sure nfacct is really using the new buffer size ... you may consider offloading mysql to a different server, but if usually increasing buffers to be prepared for peaky incoming flow data should be sufficient (sure, depending on how much data you receive). For dropped flow data you'll never get stats. Cheers, Markus ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
Re: [pmacct-discussion] nfacct total bytes inconsistencies
Hi Paolo, > Posed I'm no expert of RouterOS; if it has a NetFlow export process, > can you check if it pegs at 100% CPU? Or if anything suspicious emerges > from the router logs? The netflow process runs at 0.1-0.2% CPU (on a 36core router). Unfortunately RouterOS' netflow options and stats are very basic and the logs do not include anything related to it :( But there isn't too much traffic (and the netflow traffic is at 50-150pps) > On the nfacctd side, if logs are clean then it should mean internal > buffering is OK. Still, better to double-check buffering between the > kernel and nfacctd. At this propo, can you please follow notes in > section D of chapter XXI of a recent pmacct QUICKSTART guide ( see > https://github.com/paololucente/pmacct/blob/master/pmacct/QUICKSTART ), > essentially to check if there is any UDP drops? Today I installed a new server just for nfacctd running the latest debian with MySQL 5.7. I've got about 2hours of stats so far and they seem to coincide with Observium's stats. Though I noticed there is a 5 minute skew in what nfacctd and Observium show for the same time frame. I guess it's because nfacctd will insert the flow in the appropriate time-bin based on it's timestamps while the cron script of observium will log the octects from snmp at the time it runs (which maybe 1-2 minutes after the 5 minute mark since it collects data from many ports on each run). The good thing is that regardless of that time skew between the two systems, the amount of data measured stay in par as time goes by. I need to keep an eye on it to make sure that's the case since I've noticed the previous days that during the day when traffic is at its peak the discrepancies increase. By the way, using MySQL 5.7, nfacctd caused errors from time to time complaining that the vlan or tos fields do not have a default value (this was the sql error). I simply altered those fields to have a default value and now those errors stopped. Maybe this is a bug caused by MySQL 5.7 and current pmacct's default db schema (I believe it is not related to my current issue since on the old server I have MySQL 5.5, just mentioning it). Now, checking the udp drop counters on the old server, indeed I see some 25000+ drops. That counter seem to increase during the refresh time of the sql plugin. Not always though. Is there a connection between the drops and the mysql insert/update process? If so, would running the mysql server on a different server eliminate any future possibility of that happening again? I don't see any drops on the new server, so that's a good thing, and that may account for the fact that it seems to count the totals properly (I certainly hope so!) I added the nfacctd_pipe_size and modified the rmem_default & rmem_max as suggested in the FAQ (silly me, I didn't read it all the way to the end!) but I still see the drops counter increase. But if the new server works OK as it is, I don't really care if the old one has drops (for any reason). Also I couldn't find any documenation for this config parameter on the Official Config Keys wiki page. http://wiki.pmacct.net/OfficialConfigKeys > Finally, i see sql_refresh_time and sql_history are set to different > values - meaning SQL UPDATE queries are involved; this is OK as long > as the actual database does not suffer from them; can you check that > SQL writer processes are not piling up? This can be done with a simple > "ps auxw | grep nfacctd". I've set them that low to troubleshoot the problem (check the new data in 1 minute intervals in the database). Watching the insert/update queries fly by the terminal during debug mode, it takes about 10seconds to finish - without any errors, so it doesn't seem to be any issue there. I'll keep an eye on the new setup to see how it goes and I'll keep you posted if the issue persists. Thank you for your help :) Cheers, Vaggelis. ___ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists