Hello. I've been spending my time for addressing a thorny issue raised by Linux kernel's memory allocator's behavior. In short, a local unprivileged user can trivially lock up Linux systems unless memory cgroups are appropriately used. A very very long summary (index to LWN.net articles) is at http://marc.info/?l=linux-kernel&m=143239201905479 .
When I was working at NTT Open Source Software Center, I mainly had charge of troubles caused by Linux kernels, especially kernel panics, unexpected reboots and hang up. Since this issue pop up, I'm suspecting that some of unexpected hang up troubles were caused by this issue. (Unfortunately, I was not in time for passing along to customers how to correct debugging information before they press reset button of their servers. Thus, I have no evidence that they are actually hitting this issue.) Currently, Michal Hocko is trying to reduce the possibility of hitting this issue by allowing memory allocation requests without __GFP_FS flag to fail. But this approach needs to be tested very carefully because a cleanup patch which unexpectedly allowed memory allocation requests without __GFP_FS flag to fail resulted in unstable systems. To me, Michal's approach will be too late for customers to apply fixes of unexpected fallouts because they want to use specific kernel version for as long as possible. (Well, that contributes why I had charge of kernel panics and unexpected reboots.) I think that introducing proactive countermeasure (like "use in-kernel access restriction mechanisms such as SELinux") is an approach which customers can choose before they decide specific kernel version to use. (That's the abovementioned patch.) Speak of memory allocation requests without __GFP_FS flag, in-kernel access restriction mechanisms (including TOMOYO/AKARI/CaitSith) are using it. This means that if we go with Michal's approach, access requests from user space will start failing with ENOMEM error when memory is tight. It is not happy that access requests by critical processes are failed by inconsequential process's memory consumption (whereas /proc/$pid/oom_score_adj can protect critical processes from inconsequential process). Isolating all processes into appropriate memory cgroup would be something like restricting all processes with SELinux, which is not easy (impossible for most of systems). I prefer fixing callers (adding __GFP_NORETRY to callers) in a step-by-step fashion after adding proactive countermeasure over changing the default behavior (implicitly applying __GFP_NORETRY inside). I have no idea how this story is going to end... _______________________________________________ tomoyo-users-en mailing list tomoyo-users-en@lists.osdn.me http://lists.osdn.me/mailman/listinfo/tomoyo-users-en