Hi,
On 23 June 2017 at 12:54:36 AM, Jerry Jelinek ([email protected]) wrote:

1) In your zone you are trying to use a lot more physical memory than the limit 
you have set for the zone. The overall thrashing behavior you have described 
sounds like what would be expected in this case.
So, there's a lot I don't understand about SmartOS memory. If I set up 
4-physical/8-swap and allocate 4GB inside the zone, it shows (under 
zonememstat) as being 50% full. I take this to mean that the maximum that can 
be allocated inside the zone is 8GB, and that the paging mechanism is 
responsible for deciding which bit is physical and which bit is disk.

Here, example on a 32GB machine (with 64GB nVME swap):

[root@tiny ~]# zonememstat -t
                                 ZONE  RSS(MB)  CAP(MB)  NOVER  POUT(MB) SWAP%
                               global        0        -      -         -     -
           ctr-vKUWfLdjACa6aUt3fRTt5P        2     8192      0         0 87.57
           ctr-quLfQMoeND3yrnZj5aWgeL        2     8192      0         0 87.57
           ctr-5NtmqXi98tsFTqYBRNH3ZU        2     8192      0         0 87.57
           ctr-X33PVA7jEXVsVx3ZhT2sJ5        2     8192      0         0 87.57
                                total        8    32768      0         0     -

Each zone is running debian (lx) and has 7GB allocated in it (by python3). The 
edited highlights of ps from the global zone:

root      4564  0.0 22.0 7369636 7347664 ?        S 05:15:02  0:03 python3 
/incremental_alloc.py 7
root      4579  0.0 22.0 7369636 7347668 ?        S 05:15:09  0:03 python3 
/incremental_alloc.py 7
root      4594  0.0 22.0 7369636 7347660 ?        S 05:15:15  0:04 python3 
/incremental_alloc.py 7
root      4609  0.0 22.0 7369636 7347664 ?        S 05:15:19  0:04 python3 
/incremental_alloc.py 7

...disagrees on the RSS of each zone (7347664) but at least we can see the 
allocation. Swap agrees with ps:

[root@tiny ~]# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/zones/swap 90,1        4K      64G      64G

As does vmstat:

[root@tiny ~]# vmstat -S 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr bk bk lf rm   in   sy   cs us sy id
 0 0 0 78725148 11829532 0 0 182 0 0  0 2373 -0 20 37 -11 3039 5894 2034 0 1 98
 0 0 0 68523768 1342112 0 0  0  0  0  0  0  0  0  0  0 2735 1364  471  0  1 99
 0 0 0 68523688 1342032 0 0  0  0  0  0  0  0  0  0  0 2544  649  370  0  1 99
 0 0 0 68523688 1342032 0 0  0  0  0  0  0  0  6  0  0 2875 2008 1501  0  1 98

If I now allocate 1GB more in one of the zones...

[root@tiny ~]# vmstat -S 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr bk bk lf rm   in   sy   cs us sy id
 0 0 0 77583364 10655732 0 0 161 0 0  0 2107 -0 19 37 -11 2988 5306 1861 0 1 98
 0 0 0 68523744 1342084 0 0  0  0  0  0  0  0  0  0  0 2670  691  457  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  8  0  0 2720  660 1230  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2633 2000  541  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2584  646  410  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2504  652  361  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2544  646  406  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2608 1311  498  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2661  646  431  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2482  653  339  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2556  646  409  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2660 1300  530  0  1 99
 0 0 0 68523664 1342004 0 0  0  0  0  0  0  0  0  0  0 2898 1211  593  2  4 94
 0 0 0 67857516 675848 0  0  0  0  0 522324 1219476 0 0 0 0 6026 2471 1707 1 12 
86
 0 0 0 67472216 290548 0  0 80  0  0 470092 1026886 0 0 0 15 652865 650 2424 0 
26 74
 0 0 0 67471628 289852 0  0  0  0 2652 423084 1109202 0 0 0 0 684690 649 919 0 
29 71
 0 0 0 67471576 292452 0  0  0 74256 75940 380776 1476922 0 98 0 0 533368 646 
431081 0 29 71
 0 0 0 67147676 42084 0   0  0  0 2712 342700 1429250 0 0 0 0 2906 659 1940 0 
11 89
 0 0 0 67147440 44564 0   0  0  0 45422 308432 1426855 0 0 0 0 2615 630 23296 0 
11 89

All hell breaks loose and 25 seconds later the box locks entirely (full trace 
at https://gist.github.com/RantyDave/218f8f3bab74fa623677b450f372286e).

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  si  so pi po fr de sr bk bk lf rm   in   sy   cs us sy id
 0 0 6 67131172 119868 0  0  0  0  0 37512 804536 0 0 0 0 2488 596 722 0  9 91
 0 0 6 67131172 119868 0  0  0  0  0 33764 732708 0 0 0 0 2483 607 1307 0 10 90
 0 0 6 67131172 119868 0  0  0  0  0 30388 717851 0 0 0 0 2469 596 902 0  9 91

The machine will ping, and you can type into the console but not log in. This 
actually went a lot less well than I thought it would.

Power cycling the machine and there's nothing in /cores or /var/cores. Edited 
highlights of the zone conf (from zonecfg info):

zonename: ctr-quLfQMoeND3yrnZj5aWgeL
zonepath: /zones/ctr-quLfQMoeND3yrnZj5aWgeL
brand: lx
limitpriv: default
scheduling-class: 
ip-type: exclusive
hostid: 
fs-allowed: 
uuid: ac9b7109-536f-c632-dd37-8dddaab0cc4b
[max-lwps: 2000]
[max-shm-memory: 8G]
[max-shm-ids: 4096]
[max-msg-ids: 4096]
[max-sem-ids: 4096]
[cpu-shares: 100]
net:
        (snipped)

capped-memory:
        [physical: 8G]
        [swap: 8G]
        [locked: 8G]

(more snip)
attr:
        name: docker
        type: string
        value: true
attr:
        name: init-name
        type: string
        value: /native/usr/vm/sbin/dockerinit
attr:
        name: kernel-version
        type: string
        value: 3.16
rctl:
        name: zone.max-lwps
        value: (priv=privileged,limit=2000,action=deny)
rctl:
        name: zone.max-shm-memory
        value: (priv=privileged,limit=8589934592,action=deny)
rctl:
        name: zone.max-shm-ids
        value: (priv=privileged,limit=4096,action=deny)
rctl:
        name: zone.max-sem-ids
        value: (priv=privileged,limit=4096,action=deny)
rctl:
        name: zone.max-msg-ids
        value: (priv=privileged,limit=4096,action=deny)
rctl:
        name: zone.cpu-shares
        value: (priv=privileged,limit=100,action=none)
rctl:
        name: zone.max-physical-memory
        value: (priv=privileged,limit=8589934592,action=deny)
rctl:
        name: zone.max-swap
        value: (priv=privileged,limit=8589934592,action=deny)
rctl:
        name: zone.max-locked-memory
        value: (priv=privileged,limit=8589934592,action=deny)

Setting physical, swap and locked the same is what happens if you make a 
textbook lx using vmadm.

Oh, and I set the arc to be tiny...

[root@tiny ~]# arcstat
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c  
05:51:43     0     0      0     0    0     0    0     0    0    67M  512M  

And rcapd wasn't running. 

Running the test again with rcapd we get a better looking zonememstat:

[root@tiny ~]# zonememstat
                                 ZONE  RSS(MB)  CAP(MB)  NOVER  POUT(MB) SWAP%
                               global      185        -      -         -     -
           ctr-wBHaBqWZvWFAp9huch5ioX     7185     8192      0         0 87.57
           ctr-FHTL4BQ9Wh63kFShD6M8L6     7181     8192      0         0 87.57
           ctr-AsFSSvYzomUFXJgTNSxkVH     7181     8192      0         0 87.57
           ctr-RZXUVpsgAzKKmakdPD3PZN     7181     8192      0         0 87.57

The output from ps is the same. The effect of allocating the last gig is the 
same and because I left vfsstat running we get another smoking gun:

  r/s   w/s  kr/s  kw/s ractv wactv read_t writ_t  %r  %w   d/s  del_t zone
 34.6   1.3  15.3   0.3   0.0   0.8    2.0 643107.4   0  81   0.0    0.0 global 
(0)
  0.0 161.8   0.0 18877.8   0.0   0.1    0.0  581.8   0   9   0.0    0.0 
ctr-wBHa (1)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 ctr-FHTL 
(2)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 ctr-AsFS 
(3)
  0.0   0.0   0.0   0.0   0.0   0.0    0.0    0.0   0   0   0.0    0.0 ctr-RZXU 
(4)

2) The process eventually terminates with a SIGBUS.
The first time through I used alpine linux, this one used debian.

What I was *expecting* was that either the allocation would return NULL 
(possibly causing the SIGBUS), a signal would be sent to the application (same 
result again), or a Linux-esque OOM killer would shoot it through the head. 
Either which way I would've thought that preventing an ngz from taking down 
global would be job #1 :(

 I don't know if this is an issue with your application code or with our 
platform.
The application code is a test harness *just* for incrementally allocating 
memory: https://gist.github.com/RantyDave/c2322891f86f26f4696b3a8b3a478b62

3) The box eventually locks up. That is clearly our issue and is something we 
would want to investigate. Can you force a system dump and provide that to us? 
If you can't NMI your box when it is in this state, then you might be able to 
force a dump using DTrace.
Sorry, I really can't help you there.

-Dave






-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to