Hi,

As we had (quite) some "missing data" messages, I was excited to see 1.7.0 
makes ZeroMQ config very easy, 
since we where still running with the howe-grown circular buffer.


So last week, I've upgraded our systems from pmacct 1.6.1 to 1.7.0 

Since then I experience issues on 2 out of 7 systems : oom-killer on one of the 
pmacct MySQL plugins

[Mon Nov  6 09:54:38 2017] pmacctd invoked oom-killer: gfp_mask=0x280da, 
order=0, oom_score_adj=0
[Mon Nov  6 09:54:38 2017] pmacctd cpuset=/ mems_allowed=0-1
[Mon Nov  6 09:54:38 2017] CPU: 15 PID: 44049 Comm: pmacctd Tainted: G          
 OE  ------------   3.10.0-693.5.2.el7.x86_64 #1
[Mon Nov  6 09:54:38 2017] Hardware name: Dell Inc. PowerEdge R630, BIOS 2.4.3 
01/17/2017
[Mon Nov  6 09:54:38 2017]  ffff88103dd4dee0 0000000019a0de94 ffff880036adf5f0 
ffffffff816a3e51
[Mon Nov  6 09:54:38 2017]  ffff880036adf680 ffffffff8169f246 ffff880036adf688 
ffffffff812b7d1b
[Mon Nov  6 09:54:38 2017]  ffff88203c336e68 0000000000000202 ffffffff00000202 
fffeefff00000000
[Mon Nov  6 09:54:38 2017] Call Trace:
[Mon Nov  6 09:54:38 2017]  [<ffffffff816a3e51>] dump_stack+0x19/0x1b
[Mon Nov  6 09:54:38 2017]  [<ffffffff8169f246>] dump_header+0x90/0x229
[Mon Nov  6 09:54:38 2017]  [<ffffffff812b7d1b>] ? 
cred_has_capability+0x6b/0x120
[Mon Nov  6 09:54:38 2017]  [<ffffffff811863a4>] oom_kill_process+0x254/0x3d0
[Mon Nov  6 09:54:38 2017]  [<ffffffff812b7efe>] ? selinux_capable+0x2e/0x40
[Mon Nov  6 09:54:38 2017]  [<ffffffff81186be6>] out_of_memory+0x4b6/0x4f0
[Mon Nov  6 09:54:38 2017]  [<ffffffff8169fd4a>] 
__alloc_pages_slowpath+0x5d6/0x724
[Mon Nov  6 09:54:38 2017]  [<ffffffff8118cdb5>] 
__alloc_pages_nodemask+0x405/0x420
[Mon Nov  6 09:54:38 2017]  [<ffffffff811d40a5>] alloc_pages_vma+0xb5/0x200
[Mon Nov  6 09:54:38 2017]  [<ffffffff811b2350>] handle_mm_fault+0xb60/0xfa0
[Mon Nov  6 09:54:38 2017]  [<ffffffff810c8f28>] ? __enqueue_entity+0x78/0x80
[Mon Nov  6 09:54:38 2017]  [<ffffffff816b0074>] __do_page_fault+0x154/0x450
[Mon Nov  6 09:54:38 2017]  [<ffffffff816b03a5>] do_page_fault+0x35/0x90
[Mon Nov  6 09:54:38 2017]  [<ffffffff816ac5c8>] page_fault+0x28/0x30
[Mon Nov  6 09:54:38 2017]  [<ffffffff81330379>] ? 
copy_user_enhanced_fast_string+0x9/0x20
[Mon Nov  6 09:54:38 2017]  [<ffffffff81336a4a>] ? memcpy_toiovec+0x4a/0x90
[Mon Nov  6 09:54:38 2017]  [<ffffffff815796e8>] 
skb_copy_datagram_iovec+0x128/0x280
[Mon Nov  6 09:54:38 2017]  [<ffffffff815d88aa>] tcp_recvmsg+0x24a/0xb50
[Mon Nov  6 09:54:38 2017]  [<ffffffff81606aea>] inet_recvmsg+0x7a/0xa0
[Mon Nov  6 09:54:38 2017]  [<ffffffff8156a88f>] sock_recvmsg+0xbf/0x100
[Mon Nov  6 09:54:38 2017]  [<ffffffff815da029>] ? tcp_poll+0x219/0x230
[Mon Nov  6 09:54:38 2017]  [<ffffffff8124b859>] ? 
ep_scan_ready_list.isra.7+0x1b9/0x1f0
[Mon Nov  6 09:54:38 2017]  [<ffffffff8156aa08>] SYSC_recvfrom+0xe8/0x160
[Mon Nov  6 09:54:38 2017]  [<ffffffff8156b2fe>] SyS_recvfrom+0xe/0x10
[Mon Nov  6 09:54:38 2017]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[Mon Nov  6 09:54:38 2017] Mem-Info:
[Mon Nov  6 09:54:38 2017] active_anon:31102734 inactive_anon:1375631 
isolated_anon:64
 active_file:61 inactive_file:0 isolated_file:0
 unevictable:0 dirty:0 writeback:200 unstable:0
 slab_reclaimable:10481 slab_unreclaimable:34483
 mapped:10650 shmem:9528 pagetables:66634 bounce:0
 free:88657 free_pcp:30 free_cma:0
[Mon Nov  6 09:54:38 2017] Node 0 DMA free:15864kB min:8kB low:8kB high:12kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB 
managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Mon Nov  6 09:54:38 2017] lowmem_reserve[]: 0 1690 64141 64141
[Mon Nov  6 09:54:38 2017] Node 0 DMA32 free:250920kB min:1184kB low:1480kB 
high:1776kB active_anon:1096172kB inactive_anon:365444kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1985264kB managed:1733112kB mlocked:0kB dirty:0kB writeback:0kB 
mapped:484kB shmem:488kB slab_reclaimable:456kB slab_unreclaimable:3256kB 
kernel_stack:224kB pagetables:2600kB unstable:0kB bounce:0kB free_pcp:120kB 
local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 
all_unreclaimable? yes
[Mon Nov  6 09:54:38 2017] lowmem_reserve[]: 0 0 62451 62451
[Mon Nov  6 09:54:38 2017] Node 0 Normal free:42772kB min:43740kB low:54672kB 
high:65608kB active_anon:60618136kB inactive_anon:2525396kB active_file:264kB 
inactive_file:0kB unevictable:0kB isolated(anon):128kB isolated(file):0kB 
present:65011712kB managed:63949968kB mlocked:0kB dirty:0kB writeback:376kB 
mapped:21788kB shmem:21608kB slab_reclaimable:20352kB 
slab_unreclaimable:65620kB kernel_stack:5856kB pagetables:129356kB unstable:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:403 all_unreclaimable? yes
[Mon Nov  6 09:54:38 2017] lowmem_reserve[]: 0 0 0 0
[Mon Nov  6 09:54:38 2017] Node 1 Normal free:45072kB min:45172kB low:56464kB 
high:67756kB active_anon:62696628kB inactive_anon:2611684kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):128kB isolated(file):0kB 
present:67108864kB managed:66046872kB mlocked:0kB dirty:0kB writeback:424kB 
mapped:20328kB shmem:16016kB slab_reclaimable:21116kB 
slab_unreclaimable:69024kB kernel_stack:4976kB pagetables:134580kB unstable:0kB 
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB 
pages_scanned:1984 all_unreclaimable? yes
[Mon Nov  6 09:54:38 2017] lowmem_reserve[]: 0 0 0 0
[Mon Nov  6 09:54:38 2017] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 1*32kB (U) 
1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB 
(M) = 15864kB
[Mon Nov  6 09:54:38 2017] Node 0 DMA32: 220*4kB (UE) 253*8kB (UE) 182*16kB 
(UEM) 64*32kB (UEM) 76*64kB (UEM) 45*128kB (UEM) 38*256kB (UEM) 39*512kB (UEM) 
32*1024kB (UE) 29*2048kB (U) 27*4096kB (UM) = 250936kB
[Mon Nov  6 09:54:38 2017] Node 0 Normal: 2520*4kB (UE) 1980*8kB (UEM) 788*16kB 
(UEM) 82*32kB (UEM) 24*64kB (UEM) 8*128kB (UEM) 1*256kB (M) 0*512kB 0*1024kB 
0*2048kB 0*4096kB = 43968kB
[Mon Nov  6 09:54:38 2017] Node 1 Normal: 2140*4kB (UE) 4637*8kB (UM) 82*16kB 
(UM) 1*32kB (M) 2*64kB (M) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
47128kB
[Mon Nov  6 09:54:38 2017] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[Mon Nov  6 09:54:38 2017] Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB
[Mon Nov  6 09:54:38 2017] Node 1 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
[Mon Nov  6 09:54:38 2017] Node 1 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB
[Mon Nov  6 09:54:38 2017] 13517 total pagecache pages
[Mon Nov  6 09:54:38 2017] 4044 pages in swap cache
[Mon Nov  6 09:54:38 2017] Swap cache stats: add 4249745, delete 4245701, find 
35957085/35970537
[Mon Nov  6 09:54:38 2017] Free swap  = 0kB
[Mon Nov  6 09:54:38 2017] Total swap = 4194300kB
[Mon Nov  6 09:54:38 2017] 33530455 pages RAM
[Mon Nov  6 09:54:38 2017] 0 pages HighMem/MovableOnly
[Mon Nov  6 09:54:38 2017] 593993 pages reserved
[Mon Nov  6 09:54:38 2017] [ pid ]   uid  tgid total_vm      rss nr_ptes 
swapents oom_score_adj name
[Mon Nov  6 09:54:38 2017] [  771]     0   771    17483     7670      41       
51             0 systemd-journal
[Mon Nov  6 09:54:38 2017] [  802]     0   802    11959       25      24      
735         -1000 systemd-udevd
[Mon Nov  6 09:54:38 2017] [ 2444]     0  2444    13863       10      26      
101         -1000 auditd
[Mon Nov  6 09:54:38 2017] [ 2463]     0  2463     5468       89      15       
80             0 irqbalance
[Mon Nov  6 09:54:38 2017] [ 2467]    81  2467     8153       62      18       
49          -900 dbus-daemon
[Mon Nov  6 09:54:38 2017] [ 2482]     0  2482     6051       43      17       
32             0 systemd-logind
[Mon Nov  6 09:54:38 2017] [ 2483]   998  2483   133561       96      58     
1532             0 polkitd
[Mon Nov  6 09:54:38 2017] [ 2484]     0  2484    75472     3998      66      
835             0 rsyslogd
[Mon Nov  6 09:54:38 2017] [ 2554]     0  2554    31558       26      18      
132             0 crond
[Mon Nov  6 09:54:38 2017] [ 2578]     0  2578    27511        1      10       
31             0 agetty
[Mon Nov  6 09:54:38 2017] [ 2585]   997  2585    25108       30      20       
62             0 chronyd
[Mon Nov  6 09:54:38 2017] [ 3055]     0  3055    26499       13      55      
232         -1000 sshd
[Mon Nov  6 09:54:38 2017] [ 3057]     0  3057   140598      106      88     
2614             0 tuned
[Mon Nov  6 09:54:38 2017] [ 3524]     0  3524    22504        4      44      
275             0 master
[Mon Nov  6 09:54:38 2017] [ 3526]    89  3526    22547       14      45      
260             0 qmgr
[Mon Nov  6 09:54:38 2017] [ 3730]     0  3730   247949      257      67     
4361             0 dsm_sa_datamgrd
[Mon Nov  6 09:54:38 2017] [ 3803]     0  3803    75246       92      40      
126             0 dsm_sa_eventmgr
[Mon Nov  6 09:54:38 2017] [ 3828]     0  3828   111461      494      51      
879             0 dsm_sa_snmpd
[Mon Nov  6 09:54:38 2017] [ 3834]     0  3834   180364        6      59     
4326             0 dsm_sa_datamgrd
[Mon Nov  6 09:54:38 2017] [ 3877]     0  3877   158222       21      41      
672             0 dsm_om_shrsvcd
[Mon Nov  6 09:54:38 2017] [44029]     0 44029    96018     3183      46      
547             0 pmacctd
[Mon Nov  6 09:54:38 2017] [44030]     0 44030  3861827  3769206    7400      
287             0 pmacctd
[Mon Nov  6 09:54:38 2017] [44037]     0 44037 29678700 28615469   57953   
997921             0 pmacctd
[Mon Nov  6 09:54:38 2017] [44038]     0 44038   112024    72059     184     
4966             0 pmacctd
[Mon Nov  6 09:54:38 2017] [44045]     0 44045    46356     2918      49     
6122             0 pmacctd
[Mon Nov  6 09:54:38 2017] [44046]     0 44046    46389     3147      49     
5705             0 pmacctd
[Mon Nov  6 09:54:38 2017] [58219]    89 58219    22530      272      44        
0             0 pickup
[Mon Nov  6 09:54:38 2017] [59874]     0 59874    47222     3116      51     
6044             0 pmacctd
[Mon Nov  6 09:54:38 2017] [59875]     0 59875    47225     3665      53     
5494             0 pmacctd
[Mon Nov  6 09:54:38 2017] Out of memory: Kill process 44037 (pmacctd) score 
846 or sacrifice child
[Mon Nov  6 09:54:38 2017] Killed process 44037 (pmacctd) total-vm:118714800kB, 
anon-rss:114461768kB, file-rss:104kB, shmem-rss:4kB

All 7 systems are identical in terms of config, they only receive different 
traffic and have a slightly different HW config
(CPU, R630 vs R720, 64G - 128G memory)

Each system runs CentOS 7.4.1708 64-bit, fully updated, dual-port Intel X520 
10G NIC 
and runs a pmacctd instance for 10G NIC1, and one for 10G NIC2
Per instance, traffic is split out over an IPv4 MySQL plugin and an IPv6 MySQL 
plugin.

Data is stored to an external MySQL (/ Percona) server

As CentOS 7.x EPEL comes with ZeroMQ 4.1, and pmacct likes >= 4.2, 
I installed ZeroMQ 4.2.2 from the ZeroMQ yum repository.

Eager as I was, I installed PR_RING 7.0.0 (non ZC to start with) as well in the 
same change from the ntop repository

After some time running, I observed the oom-killer issue on the two machines.


I suspected PF_RING at first, was running with the following config :

options pf_ring enable_tx_capture=0 quick_mode=1

Then I reduced that on those 2 machines to :

options pf_ring enable_tx_capture=0


Seems to work _so_ far on 1 of them, but on the other .... no change.


Then I removed PF_RING completely from that system, recompiled pmacct, and made 
sure pmacctd 
was now linked to libpcap.so.1 again, and no longer against libpfring.so.1


This morning, another crash..... so it does not seem (fully) related to PF_RING 
or it's config



So the only other change for 1.6.1 <-> 1.7.0 on this machines was pmacct now 
compiled with the additional option 
"--enable-zmq"

And in the config I replaced plugin_buffer_size & plugin_pipe_size with :

plugin_pipe_zmq: true
plugin_pipe_zmq_profile: large



I will probably recompile once more without ZeroMQ, and revert the config 
change, and see how that goes.


But it would be nice to get to a stable system with all features enabled, so if 
anyone has good hints, what to check, etc..

Any help/insight is appreciated :)


Regards,

Wouter

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to