Hi,

I’m trying to set up *LinuxContainerExecutor (LCE)* in *YARN* on *Hadoop
3.3.6*.
We’re experiencing *"Bad page state" kernel errors*, which appear to be
caused by *high RAM + buffer/cache pressure*. To address this, we looked
into the YARN documentation on *Memory Control* and are particularly
interested in trying *Elastic Memory Control*.

Our system is running *Linux kernel version 6.8.0-59-generic*, which
uses *cgroup
v2* by default.

>From what I’ve learned:

   -

   The config key
   yarn.nodemanager.linux-container-executor.cgroups.v2.enabled (and
   related functionality) was introduced in this commit:
   🔗
   
https://github.com/apache/hadoop/commit/910cb6b887c73641d8eadd79249dfaa852edd809
   -

   This commit exists only in trunk (3.5.0-SNAPSHOT) and is *not included
   in 3.3.6*.
   -

   Elastic memory control depends on cgroups, which appear unsupported
   under cgroup v2 in 3.3.6.

------------------------------
My questions:

   1.

   Is there any *stable Hadoop release (e.g., 3.4.x)* that *fully supports
   cgroup v2* with *LinuxContainerExecutor* and *Elastic Memory Control*?
   2.

   If I *must stay on Hadoop 3.3.6* and the OS uses *cgroup v2*, is there *any
   way to enable LCE and how*?
   3.

   Are there any *production-ready recommendations* for managing
   buffer/cache pressure to avoid kernel-level memory issues like "Bad page
   state"?

Any practical advice or experiences from production or testing environments
would be greatly appreciated.

Thanks a lot in advance,
*Elif Sinem*

Reply via email to