Thank you very much. I will try the patches from the git.
Best regards, Andrej On 10/23/2012 10:31 PM, Matthieu Hautreux wrote: > Andrej, > > a set of patches was applied to the current dev branch of Slurm (2.5, > current master git branch) and should correct the issue you reported > concerning the behavior of the tack/cgroup memory subsystem logic. > > According to Moe, the official 2.5 version sould be available within a > month. If you want to try the fix against slurm-2.4.3, I can send you > the patches or you can get them from > https://github.com/SchedMD/slurm/commits/master using the commit range > 66e80a49ff...9a548ec199. > > Regards, > Matthieu > > 2012/10/10 Andrej Filipcic <[email protected]>: >> >> Thanks for extensive info. In the meantime, I had disabled >> task/affinity, and I am using only task/cgroup. Much lower number of >> release_agent calls. Waiting for the new development then... >> >> Best regards, >> Andrej >> >> On 10/10/2012 02:39 PM, Matthieu Hautreux wrote: >>> Hi, >>> >>> the locking that you have removed is necessary to ensure the proper >>> behavior of the cgroup directory creation. >>> It could result in the memory cgroup plugin no longer working as >>> expected and some jobs or job steps no being ran in a memory cgroup at >>> all. >>> >>> This is mostly due to the fact that the cgroup directory hierarchy >>> (uid/job_id/step_id) is automatically removed by the release agent >>> mechanism of the cgroup and not directly by the cgroup logic of SLURM. >>> As a result, when creating a new step, you can have situation where >>> you check that the job directory is present and then add the step >>> directory but in the meantime, a release agent has removed the job dir >>> and this creation failed. To avoid that, the flock of the cgroup >>> subsystem root directory was introduced. This logic was not designed >>> with "high throughput" computing in mind and so it does not really >>> work with your workload. >>> >>> Mark Grondonna has added the ability to remove the step level cgroup >>> directory directly in the SLURM logic in slurm-2.4.x and I have worked >>> also on applying the same logic for both the job and the user level of >>> the hierarchy but it is not yet included in any official version of >>> SLURM. I will work on that again and hope to have something working >>> better for slurm-2.5 (most probably for november according to >>> schedmd). I hope that the speedup will be sufficient for you. >>> >>> In the meantime, I would suggest to no longer use the cgroup memory >>> logic if you experiment the issue I mentionned at the beginning of >>> this email. >>> >>> Best regards, >>> Matthieu >>> >>> >>> >>> >>> 2012/10/1 Andrej Filipcic <[email protected]>: >>>> Found out that the release_memory is called many times for the same path >>>> unlike with the others (cpusets), 4k for 100 jobs. >>>> >>>> It seems to work much better if I replace this line: >>>> flock -x ${mountdir} -c "$0 sync $@" >>>> with >>>> flock -x -w 2 ${rmcg} -c "$0 sync $@" >>>> >>>> So, locking on the directory to be removed. I am not sure if this has >>>> any side effects... But at least, there is no excessive number of >>>> processes created and the memory cgroup tree is cleaned properly after >>>> all the jobs finish. >>>> >>>> Cheers, >>>> Andrej >>>> >>>> On 09/30/2012 01:19 PM, Andrej Filipcic wrote: >>>>> Hi, >>>>> >>>>> On 64-core nodes while submitting many short jobs, the number of calls >>>>> to release_memory agent (symlink to release_common from slurm 2.4.3 >>>>> release) can be extremely high. It seems that the script is too slow for >>>>> memory, which results in few 10k agent processes being spawned in a >>>>> short time after job completion, and the processes stay alive for a long >>>>> time. In extreme cases, the pid numbers can be exhausted preventing new >>>>> processes being spawned. To fix it partially, I had commented the "sleep >>>>> 1" in the sync part of the script. But there can still be up to few k >>>>> processes after 64 jobs complete in roughly the same time. >>>>> >>>>> Each job has about 10 processes, so the number of agent calls can be high. >>>>> >>>>> I did not notice that on the nodes with lower no of cores/jobs, and the >>>>> problem is not present for other cgroups. >>>>> >>>>> Any advice how to fix this problem? >>>>> >>>>> Cheers, >>>>> Andrej >>>>> >>>> -- >>>> _____________________________________________________________ >>>> prof. dr. Andrej Filipcic, E-mail: [email protected] >>>> Department of Experimental High Energy Physics - F9 >>>> Jozef Stefan Institute, Jamova 39, P.o.Box 3000 >>>> SI-1001 Ljubljana, Slovenia >>>> Tel.: +386-1-477-3674 Fax: +386-1-477-3166 >>>> ------------------------------------------------------------- >> >> -- >> _____________________________________________________________ >> prof. dr. Andrej Filipcic, E-mail: [email protected] >> Department of Experimental High Energy Physics - F9 >> Jozef Stefan Institute, Jamova 39, P.o.Box 3000 >> SI-1001 Ljubljana, Slovenia >> Tel.: +386-1-477-3674 Fax: +386-1-425-7074 >> ------------------------------------------------------------- -- _____________________________________________________________ prof. dr. Andrej Filipcic, E-mail: [email protected] Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-477-3166 -------------------------------------------------------------
