Re: [pmacct-discussion] nfacct total bytes inconsistencies

2015-12-01 Thread Vaggelis Koutroumpas
Hello Mario,

Yes they include everything AFAIK, but we don't have any multicast
traffic and the the broadcast traffic is very little on our VLANs
(mostly standard LAMP servers).
Plus the uplink interfaces (which I am monitoring/exporting flows for)
do not handle any broadcast traffic (except the standard arp packets of
course).


On 30/11/2015 12:14 μμ, Jentsch, Mario wrote:
> Hi Vaggelis,
> 
> do the SNMP OIDs are you monitoring for these traffic numbers include packets 
> that are not exported via Netflow (broadcast, multicast etc)?
> 
> Regards,
> Mario
> 

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] nfacct total bytes inconsistencies

2015-12-01 Thread Vaggelis Koutroumpas
Hello Paolo,

I guess I was wrong about the numbers not being off too much. I had to
wait for more data to be collected. As time passes the total bytes
accounted are getting way off.

What would be the maximum accepted discrepancy in an ideal setup?
I know that there will be differences between SNMP measurements, but how
much difference is considered normal? (I know it's kind of a vague question)

Restarting nfacctd did not change anything.

I also restarted the whole box just in case.
The UDP drop counters still stay unaffected

Udp:
78879 packets received
590 packets to unknown port received.
0 packet receive errors
25 packets sent

  sl  local_address rem_address   st tx_queue rx_queue tr tm->when
retrnsmt   uid  timeout inode ref pointer drops
  101: :A1F1 : 07 : 00:
 00 10613 2 88013a354780 0
  575: 017F:2BCB : 07 : 00:
   1100 10201 2 8800bacfdac0 0
 1720: :0044 : 07 : 00:
 00 10630 2 88013a354400 0


Regarding the VLAN traffic, I do have VLAN traffic, but Mirkotik does
not export this field as far as I can tell from the netflow template.

DEBUG ( default/core ): NfV9 agent : X.X.X.X:0
DEBUG ( default/core ): NfV9 template type : flow
DEBUG ( default/core ): NfV9 template ID   : 257
DEBUG ( default/core ):
-
DEBUG ( default/core ): |pen | field type | offset |
size  |
DEBUG ( default/core ): | 0  | ip version |  0 |
  1 |
DEBUG ( default/core ): | 0  | IPv6 src addr  |  1 |
 16 |
DEBUG ( default/core ): | 0  | IPv6 src mask  | 17 |
  1 |
DEBUG ( default/core ): | 0  | input snmp | 18 |
  4 |
DEBUG ( default/core ): | 0  | IPv6 dst addr  | 22 |
 16 |
DEBUG ( default/core ): | 0  | IPv6 dst mask  | 38 |
  1 |
DEBUG ( default/core ): | 0  | output snmp| 39 |
  4 |
DEBUG ( default/core ): | 0  | IPv6 next hop  | 43 |
 16 |
DEBUG ( default/core ): | 0  | L4 protocol| 59 |
  1 |
DEBUG ( default/core ): | 0  | tcp flags  | 60 |
  1 |
DEBUG ( default/core ): | 0  | tos| 61 |
  1 |
DEBUG ( default/core ): | 0  | L4 src port| 62 |
  2 |
DEBUG ( default/core ): | 0  | L4 dst port| 64 |
  2 |
DEBUG ( default/core ): | 0  | 31 | 66 |
  4 |
DEBUG ( default/core ): | 0  | 64 | 70 |
  4 |
DEBUG ( default/core ): | 0  | last switched  | 74 |
  4 |
DEBUG ( default/core ): | 0  | first switched | 78 |
  4 |
DEBUG ( default/core ): | 0  | in bytes   | 82 |
  4 |
DEBUG ( default/core ): | 0  | in packets | 86 |
  4 |
DEBUG ( default/core ): | 0  | in dst mac | 90 |
  6 |
DEBUG ( default/core ): | 0  | out src mac| 96 |
  6 |
DEBUG ( default/core ):
-
DEBUG ( default/core ): Netflow V9/IPFIX record size : 102
DEBUG ( default/core ):


What drives me crazy is that if I do controlled data transfers for long
periods of time, nfacctd counts everything properly. I can see the rate
at which the bytes counter increases in the database, with which doing
the calculations results in exactly the mbit/s I am doing transfers at.

So it seems that RouterOS does export the flows properly and nfacctd
does measure the bytes properly.
And yet, when checking the results on another IP (which has normal web
traffic) then the data are always off and getting worse as time goes by.
What's even stranger is that the Download bytes (which is always less in
reality) is measured slightly higher in nfacctd (from a few MB to a few
hundred MB).
While upload data is measured less than what is actually going through
the wire. (from 1GB to 3-4GB less per hour, depending on how much
traffic the server has at any given hour)


Unfortunately the collector box is not accessible from the internet. I
understand that this would help you identify the issue much quicker than
explaining to me every possible solution to try.
I'll try to get permission to allow you access (via VPN or something) if
nothing else works.
I really do appreciate the offer to help! :)


I noticed today some sporadic info messages on nfacctd output.

INFO: expecting flow '657566' but received '657621'
collector=0.0.0.0:2055 agent=X.X.X.X1:0
INFO: expecting flow '657677' but received '657738'
collector=0.0.0.0:2055 agent=X.X.X.X:0


Is this normal? Does that mean that it lost a flow somewhere and that's
why it throws this INFO message?

I have increased the buffers:

plugin_pipe_size:   268435456
plugin_buffer_size: 268435
nfacctd_pipe_size:  

Re: [pmacct-discussion] nfacct total bytes inconsistencies

2015-11-30 Thread Jentsch, Mario
Hi Vaggelis,

do the SNMP OIDs are you monitoring for these traffic numbers include packets 
that are not exported via Netflow (broadcast, multicast etc)?

Regards,
Mario

> -Original Message-
> From: pmacct-discussion [mailto:pmacct-discussion-boun...@pmacct.net]
> On Behalf Of Vaggelis Koutroumpas
> Sent: Sunday, November 29, 2015 12:23 AM
> To: Paolo Lucente; pmacct-discussion@pmacct.net
> Subject: Re: [pmacct-discussion] nfacct total bytes inconsistencies
> 
> It seems that the new server shows the same behavior after all :(
> 
> 
> mysql> SELECT (
> -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB')
> as bytesFROM hourlyWHERE ip_dst = '0.0.0.0' AND stamp_inserted
> BETWEEN  '2015-11-28 20:00:00'  AND  '2015-11-28 23:59:59'
> -> ) as total_out, (
> -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB')
> as bytesFROM hourlyWHERE ip_src = '0.0.0.0' AND stamp_inserted
> BETWEEN  '2015-11-28 20:00:00'  AND  '2015-11-28 23:59:59'
> -> ) as total_in;
> +---+--+
> | total_out | total_in |
> +---+--+
> | 101.03GB  | 15.43GB  |
> +---+--+
> 1 row in set (0.05 sec)
> 
> While at the same time-frame observium reports higher 'total out' and
> less 'total in' http://prntscr.com/983ers
> 
> I guess the 'total in' discrepancy is acceptable. But the 'total out' is
> over 6Gbytes off!
> 
> If I increase the time-frame then the totals are more off.
> 
> mysql> SELECT (
> -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB')
> as bytesFROM hourlyWHERE ip_dst = '0.0.0.0' AND stamp_inserted
> BETWEEN  '2015-11-28 19:00:00'  AND  '2015-11-28 23:59:59'
> -> ) as total_out, (
> -> SELECT concat(truncate((sum(bytes)/1024/1024/1024),2), 'GB')
> as bytesFROM hourlyWHERE ip_src = '0.0.0.0' AND stamp_inserted
> BETWEEN  '2015-11-28 19:00:00'  AND  '2015-11-28 23:59:59'
> -> ) as total_in;
> +---+--+
> | total_out | total_in |
> +---+--+
> | 129.60GB  | 19.46GB  |
> +---+--+
> 1 row in set (0.02 sec)
> 
> Observium: http://prntscr.com/983nxa
> 
> Here the 'total out' is 8GBytes off.
> While 'total in' seems to be a little off but in acceptable range.
> 
> 
> There are no drops AFAICT.
> 
> root@netflow:~# netstat -s | grep Udp\: -A 5
> Udp:
> 817211 packets received
> 688 packets to unknown port received.
> 122 packet receive errors
> 14971 packets sent
> RcvbufErrors: 122
> 
> Those 122 errors are there for hours (before 20:00:00 of my query).
> 
> root@netflow:~# cat /proc/net/udp
>   sl  local_address rem_address   st tx_queue rx_queue tr tm->when
> retrnsmt   uid  timeout inode ref pointer drops
>   696: :0044 : 07 : 00:
>  00 10611 2 88007b36c780 0
>   751: :307B : 07 : 00:
>  00 10580 2 88007b36cb00 0
> 
> 
> I've also installed munin to monitor the performance of the server.
> MySQL does on average 40 queries/s.
> The server load is steadily 0.1
> The avg incoming packets are ~40pps
> 
> So the server is pretty much idle to lose any data.
> 
> Any ideas what else to check?
> What would be an acceptable 'off percentage' of the bytes in comparison
> with SNMP measurements?
> 
> 
> Thanks.
> 
> ___
> pmacct-discussion mailing list
> http://www.pmacct.net/#mailinglists

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] nfacct total bytes inconsistencies

2015-11-29 Thread Markus Weber

On 28.11.2015 21:22, Vaggelis Koutroumpas wrote:

Now, checking the udp drop counters on the old server, indeed I see some
25000+ drops. That counter seem to increase during the refresh time of
the sql plugin. Not always though. Is there a connection between the
drops and the mysql insert/update process? If so, would running the
mysql server on a different server eliminate any future possibility of
that happening again?


Get those fixed (if they increase while nfacct is running and if you can't
ensure, that the drops are caused by any other UDP packets hitting the
box). Increase max UDP receive buffers on the box and set nfacctd_pipe_size.
Make sure nfacct is really using the new buffer size ... you may consider
offloading mysql to a different server, but if usually increasing buffers
to be prepared for peaky incoming flow data should be sufficient (sure,
depending on how much data you receive).

For dropped flow data you'll never get stats.

Cheers,
Markus

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] nfacct total bytes inconsistencies

2015-11-28 Thread Vaggelis Koutroumpas
Hi Paolo,

> Posed I'm no expert of RouterOS; if it has a NetFlow export process,
> can you check if it pegs at 100% CPU? Or if anything suspicious emerges
> from the router logs?

The netflow process runs at 0.1-0.2% CPU (on a 36core router).
Unfortunately RouterOS' netflow options and stats are very basic and the
logs do not include anything related to it :(
But there isn't too much traffic (and the netflow traffic is at 50-150pps)

> On the nfacctd side, if logs are clean then it should mean internal
> buffering is OK. Still, better to double-check buffering between the
> kernel and nfacctd. At this propo, can you please follow notes in
> section D of chapter XXI of a recent pmacct QUICKSTART guide ( see
> https://github.com/paololucente/pmacct/blob/master/pmacct/QUICKSTART ),
> essentially to check if there is any UDP drops?

Today I installed a new server just for nfacctd running the latest
debian with MySQL 5.7.
I've got about 2hours of stats so far and they seem to coincide with
Observium's stats. Though I noticed there is a 5 minute skew in what
nfacctd and Observium show for the same time frame. I guess it's because
nfacctd will insert the flow in the appropriate time-bin based on it's
timestamps while the cron script of observium will log the octects from
snmp at the time it runs (which maybe 1-2 minutes after the 5 minute
mark since it collects data from many ports on each run).

The good thing is that regardless of that time skew between the two
systems, the amount of data measured stay in par as time goes by.
I need to keep an eye on it to make sure that's the case since I've
noticed the previous days that during the day when traffic is at its
peak the discrepancies increase.

By the way, using MySQL 5.7, nfacctd caused errors from time to time
complaining that the vlan or tos fields do not have a default value
(this was the sql error).
I simply altered those fields to have a default value and now those
errors stopped. Maybe this is a bug caused by MySQL 5.7 and current
pmacct's default db schema (I believe it is not related to my current
issue since on the old server I have MySQL 5.5, just mentioning it).

Now, checking the udp drop counters on the old server, indeed I see some
25000+ drops. That counter seem to increase during the refresh time of
the sql plugin. Not always though. Is there a connection between the
drops and the mysql insert/update process? If so, would running the
mysql server on a different server eliminate any future possibility of
that happening again?

I don't see any drops on the new server, so that's a good thing, and
that may account for the fact that it seems to count the totals properly
(I certainly hope so!)

I added the nfacctd_pipe_size and modified the rmem_default & rmem_max
as suggested in the FAQ (silly me, I didn't read it all the way to the
end!) but I still see the drops counter increase.
But if the new server works OK as it is, I don't really care if the old
one has drops (for any reason).

Also I couldn't find any documenation for this config parameter on the
Official Config Keys wiki page. http://wiki.pmacct.net/OfficialConfigKeys

> Finally, i see sql_refresh_time and sql_history are set to different
> values - meaning SQL UPDATE queries are involved; this is OK as long
> as the actual database does not suffer from them; can you check that
> SQL writer processes are not piling up? This can be done with a simple
> "ps auxw | grep nfacctd".

I've set them that low to troubleshoot the problem (check the new data
in 1 minute intervals in the database).
Watching the insert/update queries fly by the terminal during debug
mode, it takes about 10seconds to finish - without any errors, so it
doesn't seem to be any issue there.


I'll keep an eye on the new setup to see how it goes and I'll keep you
posted if the issue persists.

Thank you for your help :)

Cheers,
Vaggelis.

___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists