On Thu, May 2, 2013 at 4:37 PM, Paolo Lucente <[email protected]> wrote: > If both processes drop to zero CPU utilization then it looks like the > issue might be in what is feeding pmacct. Although pmacctd protects from > interface flaps (ie. if the interface drops, it tries to re-bind) can > you check your system logs to spot if there has been any link down-ups?
No, links are stable. There is no link down/up events in kernel log. Both processes just hang. I tried attaching to them via strace, one of them was stuck in futex call, another one shows up as in restart_syscall. > What OS are you running this and what mechanism are you using to feed > pmacct (plain libpcap, libpcap-mmap, PF_RING, etc.)? What version of > pmacct are you running - and can you find anything relevant in pmacct > log file, if one is configured? OS is Ubuntu 13.04, pmacct is from packages, version 0.14.0. Pmacct package in Ubuntu/Debian depends on libpcap library, so I suppose it is using libpcap. Pmacct sends logs to syslog, I don't see anything suspicous except lines like this: pmacctd[30646]: INFO ( default/core ): short IPv4 packet read (37/38/frags). Snaplen issue ? I suppose this is not a problem, just a fragment of IPv4 packet is received and pmacct is not able to see at higher level protocol fields except IP. When pmacct hangs, logging stops, so there's no clue what's wrong. > It would be ideal, if you manage to reproduce the issue, if you could > provide remote-access for an inspection - if i find you positive on this, > please follow-up prievately. Well, this happens several times per day. May be I can collect some core dumps or whatever is needed if you give me instructions? As it is running on a production router (if you can call it "production" with unstable traffic accounting ;-), we are restarting it as soon as the problem appears. Actually, there are two pmacct instances running, because when we run only one instance it maxes out CPU core. We run two instances, one for inbound, one for outbound directions, so we can spread the load among cores. Single instance hangs the same way, so problem is not related to this fact of two instances. This is config file for single direction, second is the same with direction reversed in pcap_filter: daemonize: false pidfile: /var/run/pmacctd.eth0.19-in.pid syslog: daemon pcap_filter: src net <NET1>/20 or src net <NET2>/18 interface: eth0.19 promisc: false plugins: nfprobe nfprobe_receiver: 172.19.200.19:9998 nfprobe_version: 5 nfprobe_timeouts: maxlife=120:general=15:tcp=15:tcp.rst=15:tcp.fin=15:udp=15:icmp=15:expint=15 nfprobe_maxflows: 200000 nfprobe_source_ip: <IP> -- Timur Irmatov _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
