Hi, I have made a first patch against slurm-2.3 to move the IO setup as explained in the previous emails.
You will find it on my github here : https://github.com/hautreux/slurm/commit/632de91534a723f4d84d59ab3fc5676e88f0f944 I have also removed the partial privileges drop before pam_setup as it was not obvious for me why this should be done right there. IMHO, it could prevent from doing some actions in the PAM stack due to lack of some root privileges. If you think it is mandatory, I can add it again. I have ran the testsuite and it does not seem to bring regressions, at least on my simulated cluster. However, it clearly needs multiple reviews and feedbacks and is not intended to be used in production before more validation tests. Moe, Danny, Mark, if you can take a look on it, it would be great. Such a patch is more intended to be landed in 2.4pre to have more time before a stable release including that kind of modification in the job management mechanism. Carlos, it would be really interesting if you could testl a version of slurm including this patch on your system and give us your feedback. I am pretty confident that it should bring the functionality you are looking for. Thanks in advance for your feedback. Regards, Matthieu 2012/1/12 Mark A. Grondona <mgrond...@llnl.gov>: > On Thu, 12 Jan 2012 15:42:35 +0100, Matthieu Hautreux > <matthieu.hautr...@cea.fr> wrote: >> On 01/12/2012 01:32 AM, Mark A. Grondona wrote: >> > On Wed, 11 Jan 2012 17:23:34 -0700, Moe Jette<je...@schedmd.com> wrote: >> >> > Matthieu, >> >> > >> >> > I do not see any problem moving the code as you suggest, although it >> >> > will prevent error messages being sent to the srun command from some >> >> > possible failure conditions. The change could also result in failures >> >> > that are not obvious to us looking at the code right now. >> >> > >> >> > Something that we might consider is adding a new spank plugin function >> >> > to be made at an earlier point in the slurmstepd code. That should be >> >> > a simple and low risk change. >> > spank_init() is where the plugins are actually loaded by plugstack.c, >> > so it would have to be that call which is relocated. >> > >> > Mattheiu, is the relevant auks call made from a slurm_spank_init() >> > callback, or slurm_spank_init_post_opt()? >> > >> > One thing I might want to do is break spank_init_post_opt() out of >> > spank_init() on the slurmd side, so that it could be run later >> > in the code (after IO files are set up). However, if auks or other >> > security related plugins need to handle options from the user >> > before IO can be set up, this might not work. >> The relevant auks call is made from a slurm_spank_init(). >> >> I think that the main problem here is that we definitely need to >> initialize security related spank plugins and the PAM stack prior to try >> to open any files in user context (including IO set up which can result >> in opening stdout/stderr files). Multiple scenarios would require that, >> like using NFSv4, AFS, enforcing a particular selinux policy for user's >> files management... I am trying to figure out how to shift calls for >> that purpose. >> >> The place where the CWD is changed (_fork_all_tasks) seems to be the >> best place to put the IO setup as it is the place where we are sure that >> sufficient rights are present for a user to access files and so to >> create them if necessary due to IO redirection. I will give it a try and >> give you my feedback. > > That seems like a good approach. > > Thanks for looking into it! > > mark > > >> Regards, >> Matthieu >>