On Thu, 20 Sep 2012, Dave Love wrote:
...
Fair enough. Would .../sge.$SGE_CELL work? I think that way would be
easier, particularly with heterogeneous hosts with different ideas about
cgroup mounting.
I've never used more than one cell at once - do they have independent
jids? If so, that sounds like a good start.
However, if I were to start using multiple cells, I'd want to replicate
that in the dev environment too, so I'm not sure if this helps. Would
using SGE_CLUSTER_NAME help? Although I'm not sure if this is currently
used anywhere beyond how the installer creates init scripts...
...
and, aside from the obvious
problems, we may still get interesting interactions with some of the
more exotic things regularly found in our environments (e.g. Lustre,
InfiniBand, etc.). It's going to be "fun" working it out...
Yes, some of the interactions aren't clear to me, at least. SLURM
people should already have that sort of experience, but I haven't got
the contact I was hoping for. I guess the Condor implementation is less
relevant in that respect.
...
I think most of my problems so far are related to some crazy Lustre client
modules that replicate some low-level memory functionality of the kernel,
without going through the necessary cgroup code. I think William mentioned
this on the list a while back.
There are a few threads out there about it going back a year or two - I'll
have a poke to see if there was any conclusion.
Mark
--
-----------------------------------------------------------------
Mark Dixon Email : [email protected]
HPC/Grid Systems Support Tel (int): 35429
Information Systems Services Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users