> On 27/04/2016, at 12:48 PM, Jerry Jelinek <[email protected]> wrote: > > Can you provide more information about what did not work when you set the > zone's memory cap?
This afternoon I had an enormously overloaded but completely 'stock' Debian 8 lx zone, running on VMWare Fusion, live-locked completely. It would ping but was otherwise totally unresponsive. I've had the problem with two physical machines today, too: They were both 8GB machines with the zones capped to 4GB; with 4 and 8 physical cores; and Images were 20160330T234717Z and 20160414T011743Z. I was running a make -j8 of mesos. I had iostat and vmstat running in the global zone. iostat shows... tty lofi1 ramdisk1 sd0 sd1 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy dt id 0 168 4 1 1 64 16 0 8155 520 2 0 0 0 2 21 0 76 0 328 0 0 0 0 0 0 6101 485 2 0 0 0 3 24 0 73 0 167 0 0 0 0 0 0 2473 309 3 0 0 0 3 19 0 78 0 171 0 0 0 0 0 0 7583 401 2 0 0 0 2 21 0 77 0 169 4 1 2 64 16 0 6717 523 2 0 0 0 3 23 0 75 0 170 0 0 0 0 0 0 6933 497 3 0 0 0 4 33 0 63 0 170 0 0 0 0 0 0 5944 467 3 0 0 0 5 39 0 56 Expected behaviour for something that's thrashing - particularly note the low cpu availability to userland. From vmstat: kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr lf rm s0 s1 in sy cs us sy id 1 0 158 2131188 16344 17 652 788 1992 2192 0 416011 0 0 505 0 32474 660 75736 2 19 78 0 0 158 2129876 16324 32 434 624 1996 2131 0 422930 0 0 432 0 28819 663 65128 2 20 78 0 0 158 2128736 16336 25 623 1002 2213 2336 0 419447 0 0 445 0 30441 674 75889 4 29 67 0 0 158 2127176 16320 32 655 905 2254 2579 0 424660 0 0 490 0 30580 666 74210 4 34 62 1 0 158 2126084 16328 13 517 876 2064 2180 0 422137 0 0 747 0 37349 660 87374 3 23 75 1 0 159 2124800 16344 20 607 829 2186 2338 0 414812 0 0 376 0 33842 672 78496 3 20 77 0 0 159 2123412 16332 34 738 1155 2821 2925 0 420288 0 0 555 0 41055 659 95112 5 22 73 0 0 159 2121728 16244 13 454 673 1597 1714 0 423595 0 0 405 0 25314 666 63283 3 36 61 5 0 159 2120740 16348 66 2521 2331 1100 1268 0 346933 0 7 735 0 14111 707 66809 12 45 43 5 0 159 2141400 25396 49 3875 4155 1876 1960 0 217768 0 0 1023 0 20443 699 105580 17 49 33 0 0 159 2133576 16308 12 788 1296 2765 2977 0 440330 0 0 633 0 43350 659 107305 4 24 72 0 0 159 2132144 16328 15 635 970 2148 2599 0 373470 0 0 498 0 33025 665 81605 4 31 66 Just 16MB free, presumably being what is causing the thrashing in the first place. While screeching to a halt is perhaps the expected behaviour, the unfortunate part is that it takes down the global zone as well - as in, it becomes unresponsive thus rendering the machine lost. In the name of science I ran it on Joyent's public infrastructure. It was quite stoic and made it to the end of the build, but running a make -j 32 gives you lots of g++: internal compiler error: Bus error (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <file:///usr/share/doc/gcc-4.9/README.Bugs> for instructions. Makefile:6342: recipe for target 'slave/libmesos_no_3rdparty_la-http.lo' failed make[2]: *** [slave/libmesos_no_3rdparty_la-http.lo] Error 1 and no further prompt. I did get "the system is going down for poweroff" notification, however. Presumably the Joyent zone is running on something physically much larger so the 'boom' was comparatively less. I have an image of the crash in the public infrastructure but can't for the life of me find the button to make it public :( Is there anything stupid I've done? I've repeated this on four separate platforms now so it's not some stupid hardware error. Do I just have excessive expectations? Thanks, Dave ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
