Hi, On 23 June 2017 at 12:54:36 AM, Jerry Jelinek ([email protected]) wrote:
1) In your zone you are trying to use a lot more physical memory than the limit you have set for the zone. The overall thrashing behavior you have described sounds like what would be expected in this case. So, there's a lot I don't understand about SmartOS memory. If I set up 4-physical/8-swap and allocate 4GB inside the zone, it shows (under zonememstat) as being 50% full. I take this to mean that the maximum that can be allocated inside the zone is 8GB, and that the paging mechanism is responsible for deciding which bit is physical and which bit is disk. Here, example on a 32GB machine (with 64GB nVME swap): [root@tiny ~]# zonememstat -t ZONE RSS(MB) CAP(MB) NOVER POUT(MB) SWAP% global 0 - - - - ctr-vKUWfLdjACa6aUt3fRTt5P 2 8192 0 0 87.57 ctr-quLfQMoeND3yrnZj5aWgeL 2 8192 0 0 87.57 ctr-5NtmqXi98tsFTqYBRNH3ZU 2 8192 0 0 87.57 ctr-X33PVA7jEXVsVx3ZhT2sJ5 2 8192 0 0 87.57 total 8 32768 0 0 - Each zone is running debian (lx) and has 7GB allocated in it (by python3). The edited highlights of ps from the global zone: root 4564 0.0 22.0 7369636 7347664 ? S 05:15:02 0:03 python3 /incremental_alloc.py 7 root 4579 0.0 22.0 7369636 7347668 ? S 05:15:09 0:03 python3 /incremental_alloc.py 7 root 4594 0.0 22.0 7369636 7347660 ? S 05:15:15 0:04 python3 /incremental_alloc.py 7 root 4609 0.0 22.0 7369636 7347664 ? S 05:15:19 0:04 python3 /incremental_alloc.py 7 ...disagrees on the RSS of each zone (7347664) but at least we can see the allocation. Swap agrees with ps: [root@tiny ~]# swap -lh swapfile dev swaplo blocks free /dev/zvol/dsk/zones/swap 90,1 4K 64G 64G As does vmstat: [root@tiny ~]# vmstat -S 1 kthr memory page disk faults cpu r b w swap free si so pi po fr de sr bk bk lf rm in sy cs us sy id 0 0 0 78725148 11829532 0 0 182 0 0 0 2373 -0 20 37 -11 3039 5894 2034 0 1 98 0 0 0 68523768 1342112 0 0 0 0 0 0 0 0 0 0 0 2735 1364 471 0 1 99 0 0 0 68523688 1342032 0 0 0 0 0 0 0 0 0 0 0 2544 649 370 0 1 99 0 0 0 68523688 1342032 0 0 0 0 0 0 0 0 6 0 0 2875 2008 1501 0 1 98 If I now allocate 1GB more in one of the zones... [root@tiny ~]# vmstat -S 1 kthr memory page disk faults cpu r b w swap free si so pi po fr de sr bk bk lf rm in sy cs us sy id 0 0 0 77583364 10655732 0 0 161 0 0 0 2107 -0 19 37 -11 2988 5306 1861 0 1 98 0 0 0 68523744 1342084 0 0 0 0 0 0 0 0 0 0 0 2670 691 457 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 8 0 0 2720 660 1230 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2633 2000 541 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2584 646 410 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2504 652 361 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2544 646 406 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2608 1311 498 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2661 646 431 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2482 653 339 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2556 646 409 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2660 1300 530 0 1 99 0 0 0 68523664 1342004 0 0 0 0 0 0 0 0 0 0 0 2898 1211 593 2 4 94 0 0 0 67857516 675848 0 0 0 0 0 522324 1219476 0 0 0 0 6026 2471 1707 1 12 86 0 0 0 67472216 290548 0 0 80 0 0 470092 1026886 0 0 0 15 652865 650 2424 0 26 74 0 0 0 67471628 289852 0 0 0 0 2652 423084 1109202 0 0 0 0 684690 649 919 0 29 71 0 0 0 67471576 292452 0 0 0 74256 75940 380776 1476922 0 98 0 0 533368 646 431081 0 29 71 0 0 0 67147676 42084 0 0 0 0 2712 342700 1429250 0 0 0 0 2906 659 1940 0 11 89 0 0 0 67147440 44564 0 0 0 0 45422 308432 1426855 0 0 0 0 2615 630 23296 0 11 89 All hell breaks loose and 25 seconds later the box locks entirely (full trace at https://gist.github.com/RantyDave/218f8f3bab74fa623677b450f372286e). kthr memory page disk faults cpu r b w swap free si so pi po fr de sr bk bk lf rm in sy cs us sy id 0 0 6 67131172 119868 0 0 0 0 0 37512 804536 0 0 0 0 2488 596 722 0 9 91 0 0 6 67131172 119868 0 0 0 0 0 33764 732708 0 0 0 0 2483 607 1307 0 10 90 0 0 6 67131172 119868 0 0 0 0 0 30388 717851 0 0 0 0 2469 596 902 0 9 91 The machine will ping, and you can type into the console but not log in. This actually went a lot less well than I thought it would. Power cycling the machine and there's nothing in /cores or /var/cores. Edited highlights of the zone conf (from zonecfg info): zonename: ctr-quLfQMoeND3yrnZj5aWgeL zonepath: /zones/ctr-quLfQMoeND3yrnZj5aWgeL brand: lx limitpriv: default scheduling-class: ip-type: exclusive hostid: fs-allowed: uuid: ac9b7109-536f-c632-dd37-8dddaab0cc4b [max-lwps: 2000] [max-shm-memory: 8G] [max-shm-ids: 4096] [max-msg-ids: 4096] [max-sem-ids: 4096] [cpu-shares: 100] net: (snipped) capped-memory: [physical: 8G] [swap: 8G] [locked: 8G] (more snip) attr: name: docker type: string value: true attr: name: init-name type: string value: /native/usr/vm/sbin/dockerinit attr: name: kernel-version type: string value: 3.16 rctl: name: zone.max-lwps value: (priv=privileged,limit=2000,action=deny) rctl: name: zone.max-shm-memory value: (priv=privileged,limit=8589934592,action=deny) rctl: name: zone.max-shm-ids value: (priv=privileged,limit=4096,action=deny) rctl: name: zone.max-sem-ids value: (priv=privileged,limit=4096,action=deny) rctl: name: zone.max-msg-ids value: (priv=privileged,limit=4096,action=deny) rctl: name: zone.cpu-shares value: (priv=privileged,limit=100,action=none) rctl: name: zone.max-physical-memory value: (priv=privileged,limit=8589934592,action=deny) rctl: name: zone.max-swap value: (priv=privileged,limit=8589934592,action=deny) rctl: name: zone.max-locked-memory value: (priv=privileged,limit=8589934592,action=deny) Setting physical, swap and locked the same is what happens if you make a textbook lx using vmadm. Oh, and I set the arc to be tiny... [root@tiny ~]# arcstat time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 05:51:43 0 0 0 0 0 0 0 0 0 67M 512M And rcapd wasn't running. Running the test again with rcapd we get a better looking zonememstat: [root@tiny ~]# zonememstat ZONE RSS(MB) CAP(MB) NOVER POUT(MB) SWAP% global 185 - - - - ctr-wBHaBqWZvWFAp9huch5ioX 7185 8192 0 0 87.57 ctr-FHTL4BQ9Wh63kFShD6M8L6 7181 8192 0 0 87.57 ctr-AsFSSvYzomUFXJgTNSxkVH 7181 8192 0 0 87.57 ctr-RZXUVpsgAzKKmakdPD3PZN 7181 8192 0 0 87.57 The output from ps is the same. The effect of allocating the last gig is the same and because I left vfsstat running we get another smoking gun: r/s w/s kr/s kw/s ractv wactv read_t writ_t %r %w d/s del_t zone 34.6 1.3 15.3 0.3 0.0 0.8 2.0 643107.4 0 81 0.0 0.0 global (0) 0.0 161.8 0.0 18877.8 0.0 0.1 0.0 581.8 0 9 0.0 0.0 ctr-wBHa (1) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 ctr-FHTL (2) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 ctr-AsFS (3) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.0 0.0 ctr-RZXU (4) 2) The process eventually terminates with a SIGBUS. The first time through I used alpine linux, this one used debian. What I was *expecting* was that either the allocation would return NULL (possibly causing the SIGBUS), a signal would be sent to the application (same result again), or a Linux-esque OOM killer would shoot it through the head. Either which way I would've thought that preventing an ngz from taking down global would be job #1 :( I don't know if this is an issue with your application code or with our platform. The application code is a test harness *just* for incrementally allocating memory: https://gist.github.com/RantyDave/c2322891f86f26f4696b3a8b3a478b62 3) The box eventually locks up. That is clearly our issue and is something we would want to investigate. Can you force a system dump and provide that to us? If you can't NMI your box when it is in this state, then you might be able to force a dump using DTrace. Sorry, I really can't help you there. -Dave ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
