Mark Dixon <[email protected]> writes: > I started this work but noticed that it collides/duplicates > functionality with the newer stuff (e.g. see below).
I don't think it's a big problem, and there's not actually much code involved. > If the patchset goes anywhere but not mainline into Son of Gridengine, > I'll post updates to one of the soge lists. I'm sure we'll get the functionality in, even if it ends up looking a bit different. (The basis is there already, but I hadn't got it in a state to push to the repo.) >> They may or may not apply. However, at least the configuration is >> inconsistent with the cpuset stuff in 8.1.2 (and subsequent work). >> We'll work it out, and I'm happy to hear opinions on whether it's best >> to define the cgroup location via execd_params or by finding an >> externally-created one (made at execd startup with knowledge about the >> host concerned, which I think is an admin win). > > There are advantages to allowing the admin to specify the location of > the cgroup or similar entity. For example, I sometimes run two copies > of gridengine on the same machines (production and > development/testing) and want to keep the cgroups separate to avoid > name clashes. > > Using my patchset, I can (and do) set > CGROUP_MEMORY=/cgroup/memory/sge_prod on one installation and > CGROUP_MEMORY=/cgroup/memory/sge_dev on the other. Fair enough. Would .../sge.$SGE_CELL work? I think that way would be easier, particularly with heterogeneous hosts with different ideas about cgroup mounting. > However, the mess that results from continually extending execd_params > is clearly unsustainable. Yes. SLURM shows what's potentially required. > We already have entries like execd_spool_dir and xterm in "qconf > -mconf <host|global>" - how about adding new entries like > "cgroup_memory", "cgroup_cpuset", etc.? e.g. setting them to something > like "none" or "false" could disable the relevant feature, "auto" or > "true" to trigger your automatic code, or a specific path to override > it. I was planning to extend USE_CGROUPS to allow selecting the ones to use, but I'm not sure how baroque to make this stuff. You'd hope it would just DTRT eventually, and that configuration won't be necessary. >> More importantly, there seem to be problems with what the memory >> controller reports, but it's not clear in what versions, and how you >> check. I need to chase that up, but I'd be glad of information from >> anyone who knows gory (historical) details of the memory controller. >> >> "This can may contain worms." > ... > > Well said! This is obviously a new area Well SLURM has been at it for a while, but it does seem as though Linux hasn't been at it long enough. > and, aside from the obvious > problems, we may still get interesting interactions with some of the > more exotic things regularly found in our environments (e.g. Lustre, > InfiniBand, etc.). It's going to be "fun" working it out... Yes, some of the interactions aren't clear to me, at least. SLURM people should already have that sort of experience, but I haven't got the contact I was hoping for. I guess the Condor implementation is less relevant in that respect. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
