Hi All,
I have nfacctd 1.5.0rc2 collecting NetFlow v9 flows from a pair of pmacctd
processes which send their flows to nfacctd.
Every so often, I observe segmentation faults in nfacctd requiring me to
restart the daemon.
According to gdb, the issue is happening here (consistently):
----
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nfacctd: Core Process [default]
'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041f89c in process_v9_packet (pkt=0x80005934b00d <Address
0x80005934b00d out of bounds>,
pkt@entry=0x7fff5934ae40 "\t", len=len@entry=508,
pptrsv=pptrsv@entry=0x7fff59339580, req=req@entry=0x7fff59338f00, version=9)
at nfacctd.c:1197
(gdb) info locals
hdr_v9 = 0x7fff5934ae40
hdr_v10 = 0x7fff5934ae40
template_hdr = <optimized out>
opt_template_hdr = <optimized out>
tpl = <optimized out>
data_hdr = 0x80005934b00d
pptrs = 0x7fff59339580
fid = <optimized out>
off = 461
flowoff = <optimized out>
flowsetlen = <optimized out>
direction = 38272
FlowSeqInc = 1
HdrSz = <optimized out>
SourceId = <optimized out>
FlowSeq = <optimized out>
(gdb) info args
pkt = 0x80005934b00d <Address 0x80005934b00d out of bounds>
len = 508
pptrsv = 0x7fff59339580
req = 0x7fff59338f00
version = 9
(gdb)
----
I'm not an expert at understanding the gdb output, but would be happy to
provide the gdb output if anyone would like to have a look.
It's not clear if these are in some way related to these messages, which are
frequently seen in the nfacct log (but appear harmless):
----
May 01 03:15:46 INFO: unable to read next Data Flowset (incomplete NetFlow
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462
May 01 03:15:53 INFO: unable to read next Data Flowset (incomplete NetFlow
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462
May 01 03:16:11 INFO: unable to read next Data Flowset (incomplete NetFlow
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462
----
There are two pmacct (1.5.0rc2) instances serving as nfprobes that comprise the
following configuration. The configs are the same, but have a different
nfprobe_engine (0:1 and 0:2) for each one.
---
! pmacctd configuration
daemonize: true
pidfile: /var/run/pmacctd.eth2.pid
! syslog: daemon
logfile: /var/log/pmacct/pmacctd.eth2.log
interface: eth2
plugins: nfprobe[probe]
!
nfprobe_version: 9
nfprobe_receiver: 127.0.0.1:2100
nfprobe_source_ip: 127.0.0.1
nfprobe_direction[probe]: tag
nfprobe_engine[probe]: 0:2
!plugin_buffer_size: 819200
!plugin_pipe_size: 1638400000
plugin_buffer_size: 16384
plugin_pipe_size: 32768000
!
aggregate: dst_host, src_host, src_mac, dst_mac, vlan, proto, dst_port,
src_port, tag
!
pre_tag_map: /etc/pmacct/pretag.map
refresh_maps: true
pre_tag_map_entries: 3840
---
The nfacct collector (that shows the above warnings and segfaults) contains the
following config:
----
! nfacctd configuration
daemonize: true
debug: false
pidfile: /var/run/nfacctd.collector.pid
! syslog: daemon
logfile: /var/log/pmacct/nfacctd.collector.log
! Listen locally only
nfacctd_ip: 127.0.0.1
nfacctd_port: 2101
nfacctd_time_new: true
plugins: mysql[inbound], mysql[outbound]
sql_optimize_clauses: true
! Tables for traffic accounting
aggregate[inbound]: src_mac, dst_mac, vlan, tag, tag2, dst_host
aggregate[outbound]: src_mac, dst_mac, vlan, tag, tag2, src_host
sql_table[inbound]: acct_v8_5m_in
sql_table[outbound]: acct_v8_5m_out
sql_history_roundoff[inbound]: m
sql_history_roundoff[outbound]: m
sql_history[inbound]: 5m
sql_refresh_time[inbound]: 300
sql_history[outbound]: 5m
sql_refresh_time[outbound]: 300
sql_dont_try_update[inbound]: true
sql_dont_try_update[outbound]: true
sql_multi_values[inbound]: 1024000
sql_multi_values[outbound]: 1024000
! End tables for traffic accounting
!plugin_buffer_size: 819200
!plugin_pipe_size: 1638400000
!plugin_buffer_size: 8192
!plugin_pipe_size: 16384000
plugin_buffer_size: 163840
plugin_pipe_size: 32768000
pre_tag_map: /etc/pmacct/pretag-netflow.map
pre_tag_filter[inbound]: 1
pre_tag_filter[outbound]: 2
refresh_maps: true
pre_tag_map_entries: 3840
sql_host: localhost
sql_user: <removed>
sql_db: <removed>
sql_passwd: <removed>
! in case of emergency, log to this file
sql_recovery_logfile[inbound]: /var/lib/pmacct/recovery-in_log
sql_recovery_logfile[outbound]: /var/lib/pmacct/recovery-out_log
----
This is running from a Debian 7.4 server.
Does anyone have any thoughts as to why we might be seeing nfacctd segfault
occasionally and also the occasional "unable to read next Data Flowset"
messages?
Kind Regards,
Jonathan
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists