On 09/17/2016 07:30 PM, Joshua M. Clulow wrote:
On 17 September 2016 at 15:39, Daniel Carosone
I tried this a couple of weeks ago, with no luck either. I was getting some
(apparently random) SIGBUS backtraces from ruby in amongst the logs.
Linux allows for pretty flagrant memory over-commit; that is, programs
may allocate a lot more memory than what is available in the system.
Historically, our operating system (illumos, as with Solaris before
it) has _not_ allowed for over commit. Instead, we account for all
memory allocated, such that we can't get into a situation where there
is a "run on the bank": where programs try to use some of the
allocation we granted them, but we've dropped the ball and run out of
actual RAM to give them.
When Linux systems hit real memory pressure, a deeply questionable
facility steps in: the OOM (out of memory) killer. This facility
takes no prisoners, terminating processes until the situation
The SIGBUS you're seeing is potentially the result of our attempt to
get _some_ of the way toward overcommit, so that Linux software can
work even when it mistakenly assumes memory is an infinite resource.
You have some level of control over this in linux.
If it is set to zero, you don't have to worry about overallocation and
the wild west of OOM. Note, most linux systems have an oom message in
dmesg output when it is invoked. But yes, it is unwise to allow it to
You can play with vm.overcommit_ratio (set it to 90, usually defaults to
50) to force linux to use this percentage of the sum of swap + physical
ram. And then don't give it swap.
This said, SIGBUS is not usually memory allocation related ... this is
more typically a memory alignment issue. Google is your friend here.
We force the MAP_NORESERVE flag for some mmap() allocations made by LX
branded processes. If there is a subsequent run on the bank, we
kill the process that would have overrun the swap cap in the zone with
SIGBUS, rather than random other processes.
If you're routinely seeing SIGBUS, it's possible that your swap cap
(and probably your RSS cap) should be higher for your workload. With
the same software instead running in a native (non-LX) zone, you would
probably have seen ENOMEM errors, or a failure to fork(2) new
processes. Native zones do strict accounting at the time of
allocation, so that we can report explicit errors about the memory
This isn't likely a linux issue per se, looks more to be an application
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
p: +1 734 786 8423 x121
c: +1 734 612 4615
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
Powered by Listbox: http://www.listbox.com