Hi Jonathan, >From these elements, it might be the segfault is related to the info messages you receive - both would boil down to some spurious packets being generated by nfprobe. The best way to start debugging this is to make a capture, in libpcap format, of the NetFlow packets hitting the collector so that i can replay them in lab. If feasible for you, on the collector box you can generate the capture as follows:
shell> tcpdump -i <interface to listen> -s 0 -n -w jthorpe_netflow_trace.pcap port 2101 Then mail me privately the jthorpe_netflow_trace.pcap file. You can stop the trace after a couple of occurrences of the info message or, even better (but depending how large the capture file becomes), once the collector crashes. Cheers, Paolo On Thu, May 01, 2014 at 03:24:27AM +0000, Jonathan Thorpe wrote: > Hi All, > > I have nfacctd 1.5.0rc2 collecting NetFlow v9 flows from a pair of pmacctd > processes which send their flows to nfacctd. > > Every so often, I observe segmentation faults in nfacctd requiring me to > restart the daemon. > > According to gdb, the issue is happening here (consistently): > > ---- > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `nfacctd: Core Process [default] > '. > Program terminated with signal 11, Segmentation fault. > #0 0x000000000041f89c in process_v9_packet (pkt=0x80005934b00d <Address > 0x80005934b00d out of bounds>, > pkt@entry=0x7fff5934ae40 "\t", len=len@entry=508, > pptrsv=pptrsv@entry=0x7fff59339580, req=req@entry=0x7fff59338f00, version=9) > at nfacctd.c:1197 > (gdb) info locals > hdr_v9 = 0x7fff5934ae40 > hdr_v10 = 0x7fff5934ae40 > template_hdr = <optimized out> > opt_template_hdr = <optimized out> > tpl = <optimized out> > data_hdr = 0x80005934b00d > pptrs = 0x7fff59339580 > fid = <optimized out> > off = 461 > flowoff = <optimized out> > flowsetlen = <optimized out> > direction = 38272 > FlowSeqInc = 1 > HdrSz = <optimized out> > SourceId = <optimized out> > FlowSeq = <optimized out> > (gdb) info args > pkt = 0x80005934b00d <Address 0x80005934b00d out of bounds> > len = 508 > pptrsv = 0x7fff59339580 > req = 0x7fff59338f00 > version = 9 > (gdb) > ---- > > I'm not an expert at understanding the gdb output, but would be happy to > provide the gdb output if anyone would like to have a look. > > It's not clear if these are in some way related to these messages, which are > frequently seen in the nfacct log (but appear harmless): > > ---- > May 01 03:15:46 INFO: unable to read next Data Flowset (incomplete NetFlow > v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 > May 01 03:15:53 INFO: unable to read next Data Flowset (incomplete NetFlow > v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 > May 01 03:16:11 INFO: unable to read next Data Flowset (incomplete NetFlow > v9/IPFIX packet): nfacctd=127.0.0.1:2101 agent=127.0.0.1:48462 > ---- > > There are two pmacct (1.5.0rc2) instances serving as nfprobes that comprise > the following configuration. The configs are the same, but have a different > nfprobe_engine (0:1 and 0:2) for each one. > > --- > ! pmacctd configuration > daemonize: true > pidfile: /var/run/pmacctd.eth2.pid > ! syslog: daemon > logfile: /var/log/pmacct/pmacctd.eth2.log > > interface: eth2 > > plugins: nfprobe[probe] > ! > nfprobe_version: 9 > nfprobe_receiver: 127.0.0.1:2100 > nfprobe_source_ip: 127.0.0.1 > nfprobe_direction[probe]: tag > nfprobe_engine[probe]: 0:2 > > !plugin_buffer_size: 819200 > !plugin_pipe_size: 1638400000 > > plugin_buffer_size: 16384 > plugin_pipe_size: 32768000 > > ! > aggregate: dst_host, src_host, src_mac, dst_mac, vlan, proto, dst_port, > src_port, tag > ! > pre_tag_map: /etc/pmacct/pretag.map > refresh_maps: true > pre_tag_map_entries: 3840 > --- > > The nfacct collector (that shows the above warnings and segfaults) contains > the following config: > > ---- > ! nfacctd configuration > daemonize: true > debug: false > pidfile: /var/run/nfacctd.collector.pid > ! syslog: daemon > logfile: /var/log/pmacct/nfacctd.collector.log > > ! Listen locally only > nfacctd_ip: 127.0.0.1 > nfacctd_port: 2101 > > nfacctd_time_new: true > > plugins: mysql[inbound], mysql[outbound] > > sql_optimize_clauses: true > > ! Tables for traffic accounting > aggregate[inbound]: src_mac, dst_mac, vlan, tag, tag2, dst_host > aggregate[outbound]: src_mac, dst_mac, vlan, tag, tag2, src_host > > sql_table[inbound]: acct_v8_5m_in > sql_table[outbound]: acct_v8_5m_out > > sql_history_roundoff[inbound]: m > sql_history_roundoff[outbound]: m > > sql_history[inbound]: 5m > sql_refresh_time[inbound]: 300 > sql_history[outbound]: 5m > sql_refresh_time[outbound]: 300 > > sql_dont_try_update[inbound]: true > sql_dont_try_update[outbound]: true > sql_multi_values[inbound]: 1024000 > sql_multi_values[outbound]: 1024000 > > ! End tables for traffic accounting > > !plugin_buffer_size: 819200 > !plugin_pipe_size: 1638400000 > > !plugin_buffer_size: 8192 > !plugin_pipe_size: 16384000 > > plugin_buffer_size: 163840 > plugin_pipe_size: 32768000 > > pre_tag_map: /etc/pmacct/pretag-netflow.map > > pre_tag_filter[inbound]: 1 > pre_tag_filter[outbound]: 2 > > refresh_maps: true > pre_tag_map_entries: 3840 > > sql_host: localhost > sql_user: <removed> > sql_db: <removed> > sql_passwd: <removed> > > ! in case of emergency, log to this file > sql_recovery_logfile[inbound]: /var/lib/pmacct/recovery-in_log > sql_recovery_logfile[outbound]: /var/lib/pmacct/recovery-out_log > ---- > > This is running from a Debian 7.4 server. > > Does anyone have any thoughts as to why we might be seeing nfacctd segfault > occasionally and also the occasional "unable to read next Data Flowset" > messages? > > Kind Regards, > Jonathan > > _______________________________________________ > pmacct-discussion mailing list > http://www.pmacct.net/#mailinglists _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
