Hi Jan! Thanks for your explanation. I'm glad that works for you! :-) https://issues.apache.org/jira/browse/YARN-5202 is something that Yahoo! talked about at the Hadoop Summit, (and it seems the community may be going in a similar direction, although not exactly the same.) There's also https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java . Ideally at my company we'd like memory limits also to be imposed by Cgroups because we have had the OOM-killer wreak havoc a couple of times, but from what I know, that is not an option yet.
Cheers Ravi On Wed, Aug 10, 2016 at 1:54 AM, Jan Lukavský <[email protected]> wrote: > Hi Ravi, > > we don't run into situation where memory used > RAM, because memory > configured to be used by all containers on a node is less than the total > amount on memory (by a factor of say 10%). The spikes of container memory > usage, that are tolerated due to the averaging don't happen on all > containers at once, but are more of a random nature and therefore mostly > only single running container "spikes", which therefore doesn't cause any > issues. To fully answer your question, we have overcommit enabled and > therefore, if we would run out of memory, bad things would happen. :) We > are aware of that. The risk of running into OOM-killer can be controlled by > the averaging window length - as the length grows, the more and more spikes > are tolerated. Setting the averaging window length to 1 switches this > feature off, turning it back into the "standard" behavior, which is why I > see it as a extension of the current approach, which could be interesting > to other people as well. > > Jan > > > On 10.8.2016 02:48, Ravi Prakash wrote: > > Hi Jan! > > Thanks for your contribution. In your approach what happens when a few > containers on a node are using "excessive" memory (so that total memory > used > RAM available on the machine). Do you have overcommit enabled? > > Thanks > Ravi > > On Tue, Aug 9, 2016 at 1:31 AM, Jan Lukavský <[email protected] > > wrote: > >> Hello community, >> >> I have a question about container resource calculation in nodemanager. >> Some time ago a filed JIRA https://issues.apache.org/jira >> /browse/YARN-4681, which I though might address our problems with >> container being killed because of read-only mmaping memory block. The JIRA >> has not been resolved yet, but it turned out for us, that the patch doesn't >> solve the problem. Some applications (namely Apache Spark) tend to allocate >> really large memory blocks outside JVM heap (using mmap, but with >> MAP_PRIVATE), but only for short time periods. We solved this by creating a >> smoothing resource calculator, which averages the memory usage of a >> container over some time period (say 5 minutes). This eliminates the >> problem of container being killed for short memory consumption peak, but in >> the same time preserves the ability to kill container that *really* >> consumes excessive amount of memory. >> >> My question is, does this seem a systematic approach to you and should I >> post our patch to the community or am thinking in a wrong direction from >> the beginning? :) >> >> >> Thanks for reactions, >> >> Jan >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > >
