Hi, I’m trying to set up *LinuxContainerExecutor (LCE)* in *YARN* on *Hadoop 3.3.6*. We’re experiencing *"Bad page state" kernel errors*, which appear to be caused by *high RAM + buffer/cache pressure*. To address this, we looked into the YARN documentation on *Memory Control* and are particularly interested in trying *Elastic Memory Control*.
Our system is running *Linux kernel version 6.8.0-59-generic*, which uses *cgroup v2* by default. >From what I’ve learned: - The config key yarn.nodemanager.linux-container-executor.cgroups.v2.enabled (and related functionality) was introduced in this commit: 🔗 https://github.com/apache/hadoop/commit/910cb6b887c73641d8eadd79249dfaa852edd809 - This commit exists only in trunk (3.5.0-SNAPSHOT) and is *not included in 3.3.6*. - Elastic memory control depends on cgroups, which appear unsupported under cgroup v2 in 3.3.6. ------------------------------ My questions: 1. Is there any *stable Hadoop release (e.g., 3.4.x)* that *fully supports cgroup v2* with *LinuxContainerExecutor* and *Elastic Memory Control*? 2. If I *must stay on Hadoop 3.3.6* and the OS uses *cgroup v2*, is there *any way to enable LCE and how*? 3. Are there any *production-ready recommendations* for managing buffer/cache pressure to avoid kernel-level memory issues like "Bad page state"? Any practical advice or experiences from production or testing environments would be greatly appreciated. Thanks a lot in advance, *Elif Sinem*