On Thu, 20 Sep 2012, Dave Love wrote:
...
Fair enough.  Would .../sge.$SGE_CELL work?  I think that way would be
easier, particularly with heterogeneous hosts with different ideas about
cgroup mounting.

I've never used more than one cell at once - do they have independent jids? If so, that sounds like a good start.

However, if I were to start using multiple cells, I'd want to replicate that in the dev environment too, so I'm not sure if this helps. Would using SGE_CLUSTER_NAME help? Although I'm not sure if this is currently used anywhere beyond how the installer creates init scripts...


...
and, aside from the obvious
problems, we may still get interesting interactions with some of the
more exotic things regularly found in our environments (e.g. Lustre,
InfiniBand, etc.). It's going to be "fun" working it out...

Yes, some of the interactions aren't clear to me, at least.  SLURM
people should already have that sort of experience, but I haven't got
the contact I was hoping for.  I guess the Condor implementation is less
relevant in that respect.
...

I think most of my problems so far are related to some crazy Lustre client modules that replicate some low-level memory functionality of the kernel, without going through the necessary cgroup code. I think William mentioned this on the list a while back.

There are a few threads out there about it going back a year or two - I'll have a poke to see if there was any conclusion.

Mark
--
-----------------------------------------------------------------
Mark Dixon                       Email    : [email protected]
HPC/Grid Systems Support         Tel (int): 35429
Information Systems Services     Tel (ext): +44(0)113 343 5429
University of Leeds, LS2 9JT, UK
-----------------------------------------------------------------
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to