Hi All,

I have nfacctd 1.5.0rc2 collecting NetFlow v9 flows from a pair of pmacctd 
processes which send their flows to nfacctd.

Every so often, I observe segmentation faults in nfacctd requiring me to 
restart the daemon.

According to gdb, the issue is happening here (consistently):

----
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nfacctd: Core Process [default]                          
    '.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041f89c in process_v9_packet (pkt=0x80005934b00d <Address 
0x80005934b00d out of bounds>, 
    pkt@entry=0x7fff5934ae40 "\t", len=len@entry=508, 
pptrsv=pptrsv@entry=0x7fff59339580, req=req@entry=0x7fff59338f00, version=9)
    at nfacctd.c:1197
(gdb) info locals
hdr_v9 = 0x7fff5934ae40
hdr_v10 = 0x7fff5934ae40
template_hdr = <optimized out>
opt_template_hdr = <optimized out>
tpl = <optimized out>
data_hdr = 0x80005934b00d
pptrs = 0x7fff59339580
fid = <optimized out>
off = 461
flowoff = <optimized out>
flowsetlen = <optimized out>
direction = 38272
FlowSeqInc = 1
HdrSz = <optimized out>
SourceId = <optimized out>
FlowSeq = <optimized out>
(gdb) info args
pkt = 0x80005934b00d <Address 0x80005934b00d out of bounds>
len = 508
pptrsv = 0x7fff59339580
req = 0x7fff59338f00
version = 9
(gdb)
----

I'm not an expert at understanding the gdb output, but would be happy to 
provide the gdb output if anyone would like to have a look.

It's not clear if these are in some way related to these messages, which are 
frequently seen in the nfacct log (but appear harmless):

----
May 01 03:15:46 INFO: unable to read next Data Flowset (incomplete NetFlow 
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 
May 01 03:15:53 INFO: unable to read next Data Flowset (incomplete NetFlow 
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 
May 01 03:16:11 INFO: unable to read next Data Flowset (incomplete NetFlow 
v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462
----

There are two pmacct (1.5.0rc2) instances serving as nfprobes that comprise the 
following configuration. The configs are the same, but have a different 
nfprobe_engine (0:1 and 0:2) for each one.

---
! pmacctd configuration
daemonize: true
pidfile: /var/run/pmacctd.eth2.pid
! syslog: daemon
logfile: /var/log/pmacct/pmacctd.eth2.log

interface: eth2

plugins: nfprobe[probe]
!
nfprobe_version: 9
nfprobe_receiver: 127.0.0.1:2100
nfprobe_source_ip: 127.0.0.1
nfprobe_direction[probe]: tag
nfprobe_engine[probe]: 0:2

!plugin_buffer_size: 819200
!plugin_pipe_size: 1638400000

plugin_buffer_size: 16384 
plugin_pipe_size: 32768000

!
aggregate: dst_host, src_host, src_mac, dst_mac, vlan, proto, dst_port, 
src_port, tag
!
pre_tag_map: /etc/pmacct/pretag.map
refresh_maps: true
pre_tag_map_entries: 3840
--- 

The nfacct collector (that shows the above warnings and segfaults) contains the 
following config:

----
! nfacctd configuration
daemonize: true
debug: false
pidfile: /var/run/nfacctd.collector.pid
! syslog: daemon
logfile: /var/log/pmacct/nfacctd.collector.log

! Listen locally only
nfacctd_ip: 127.0.0.1
nfacctd_port: 2101

nfacctd_time_new: true

plugins: mysql[inbound], mysql[outbound]

sql_optimize_clauses: true

! Tables for traffic accounting
aggregate[inbound]: src_mac, dst_mac, vlan, tag, tag2, dst_host
aggregate[outbound]: src_mac, dst_mac, vlan, tag, tag2, src_host

sql_table[inbound]:  acct_v8_5m_in
sql_table[outbound]:  acct_v8_5m_out

sql_history_roundoff[inbound]: m
sql_history_roundoff[outbound]: m

sql_history[inbound]: 5m
sql_refresh_time[inbound]: 300
sql_history[outbound]: 5m
sql_refresh_time[outbound]: 300

sql_dont_try_update[inbound]: true
sql_dont_try_update[outbound]: true
sql_multi_values[inbound]: 1024000
sql_multi_values[outbound]: 1024000

! End tables for traffic accounting

!plugin_buffer_size: 819200
!plugin_pipe_size: 1638400000

!plugin_buffer_size: 8192
!plugin_pipe_size: 16384000

plugin_buffer_size: 163840
plugin_pipe_size: 32768000

pre_tag_map: /etc/pmacct/pretag-netflow.map

pre_tag_filter[inbound]: 1
pre_tag_filter[outbound]: 2

refresh_maps: true
pre_tag_map_entries: 3840

sql_host: localhost
sql_user: <removed>
sql_db: <removed>
sql_passwd: <removed>

! in case of emergency, log to this file
sql_recovery_logfile[inbound]: /var/lib/pmacct/recovery-in_log
sql_recovery_logfile[outbound]: /var/lib/pmacct/recovery-out_log
----

This is running from a Debian 7.4 server.

Does anyone have any thoughts as to why we might be seeing nfacctd segfault 
occasionally and also the occasional "unable to read next Data Flowset" 
messages?

Kind Regards,
Jonathan

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to