[gentoo-user] Re: OOM memory issues
Kerin Millar kerframil at fastmail.co.uk writes: The need for the OOM killer stems from the fact that memory can be overcommitted. These articles may prove informative: http://lwn.net/Articles/317814/ Yea I saw this article. Its dated February 4, 2009. How much has changed with the kernel/configs/userspace mechanism? Nothing, everything? http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html Nice to know. In my case, the most likely trigger - as rare as it is - would be a runaway process that consumes more than its fair share of RAM. Therefore, I make a point of adjusting the score of production-critical applications to ensure that they are less likely to be culled. Ok I see the manual tools for OOM-killer. Are there any graphical tools for monitoring, configuring, and control of OOM related files and target processes? All of this performed by hand? If your cases are not pathological, you could increase the amount of memory, be it by additional RAM or additional swap [1]. Alternatively, if you are able to precisely control the way in which memory is allocated and can guarantee that it will not be exhausted, you may elect to disable overcommit, though I would not recommend it. I do not have a problem. It keeps popping up in my clustering research, frequently. Many of the clustering environments have heavy memory requirements, so this will eventually be monitored, diagnosed and managed, real time, in the cluser softwares, such as load balancing. These are very new technologies, hence my need to understand both legacy current issues and solutions. You cannot just always add resources. ONce set up you have to dynamically manage resource consumption, or at least that is what the current readings reveal. With NUMA, things may be more complicated because there is the potential for a particular memory node to be exhausted, unless memory interleaving is employed. Indeed, I make a point of using interleaving for MySQL, having gotten the idea from the Twitter fork. Well my first cluster is just (3) AMD-FX8350 with 32G ram each. Once that is working, reasonably well, I'm sure I'll be adding different (multi) processors to the mix, with differnt ram characteristis. There is a *huge interest* in heterogenous clusters, including but not limited to the GPU/APU hardware. So dynamic, real-time memory managment is quintessentially important for successful clustering. Finally, make sure you are using at least Linux 3.12, because some improvements have been made there [2]. yep, [1] I always set of gigs of swap and rarely use it, for critical computations that must be fast. Many cluster folks are building systems with both SSD and traditional (raid) HD setups. The SSD could be partitioned for the cluster and swap. Lots of experimentation on how best to deploy SSD with max_ram in systems for clusters is ongoing. Memory Management is a primary focus of Apache-Spark (in-memory) computations. Spark can be use with Python, Java and Scala; so it is very cool. --Kerin [1] At a pinch, additional swap may be allocated as a file [2] https://lwn.net/Articles/562211/#oom (2) is also good to know. thx, James
Re: [gentoo-user] Re: OOM memory issues
On 18/09/2014 19:27, James wrote: Kerin Millar kerframil at fastmail.co.uk writes: The need for the OOM killer stems from the fact that memory can be overcommitted. These articles may prove informative: http://lwn.net/Articles/317814/ Yea I saw this article. Its dated February 4, 2009. How much has changed with the kernel/configs/userspace mechanism? Nothing, everything? A new tunable, oom_score_adj, was added, which accepts values between 0 and 1000. https://github.com/torvalds/linux/commit/a63d83f#include/linux/oom.h As mentioned there, the oom_adj tunable remains for reasons of backward compatibility. Setting one will adjust the other per the appropriate scale. It doesn't look as though Karthikesan's proposal for a cgroup based controller was ever accepted. --Kerin
[gentoo-user] Re: OOM memory issues
Kerin Millar kerframil at fastmail.co.uk writes: A new tunable, oom_score_adj, was added, which accepts values between 0 and 1000. https://github.com/torvalds/linux/commit/a63d83f#include/linux/oom.h FANTASTIC! Exactly the sort of info I'm looking for learn the pass, see what has been tried, how to configure it, and if it works/fails when and why! Absolutely wonderful link! As mentioned there, the oom_adj tunable remains for reasons of backward compatibility. Setting one will adjust the other per the appropriate scale. That said, the mechanism seem too simple minded to succeed in anything but an extremely well monitored system. I think now the effort particularly in clustering codes, is to only have basis memory monitoring and control and leave the fine grained memory control needs to the clustering tools. The simple solution is there (in clustering) you just priortize jobs (codes), migrate to systems with spare resources, and bump other process to lower priority states. Also, there are (in-memory) codes like Apache-Spark, that use (RDD) Resilient Distributed Data. It doesn't look as though Karthikesan's proposal for a cgroup based controller was ever accepted. I think many of the old kernel ideas, accepted or not, are being repackaged in the clustering tools, or at least they are inspired by these codes Dude, YOU are the main{}. Keep the info flowing, as I'm sure lots of folks on this list are reading this . EXCELLENT! --Kerin James
[gentoo-user] Re: OOM memory issues
Rich Freeman rich0 at gentoo.org writes: A big problem with Linux along these fronts is that we don't really have good mechanisms for prioritizing memory use. You can set hard limits of course, which aren't flexible, but otherwise software is trusted to just guess how much RAM it should use. Exactamundo! Besides fine grained controls I want it in a fat_boy controllable gui! Clustering is where it's at. NOW much of the fuss I read in the clustering groups, particularly Spark and other in_memory tools, is all about monitoring and managing all types of memory and related issues. [1] It would be nice if processes could allocate cache RAM, which could be preferentially freed if the kernel deems necessary. If some pages are easier to regenerate than to swap, this could also be flagged (I have a 50Mbps connection - I'd rather see my browser re-fetch pages than go to disk when the disk is already busy). There are probably a lot of other ways that memory use could be optimized with hinting. I think you need to look into apache spark. It is exploding. Technology to run certain codes 100% in memory looks to be a revolution, driven by the mesos/spark clusters. [2] The weapons on top of mesos/spark are Python, Java and Scala (in portage). hth, James [1] https://issues.apache.org/jira/browse/SPARK-3535 [2] https://amplab.cs.berkeley.edu/ http://radar.oreilly.com/2014/06/a-growing-number-of-applications-are-being-built-with-spark.html