If you are volunteering, then sure. * Basing the subdirectory off the last digit or two of the job id should be easiest * Code needs to be added to create these new directories either on demand or at slurmctld startup * I would suggest making the new logic conditional upon a SLURM build-time option * Existing directories need to be moved or their jobs will be killed when slurmctld restarts using the new logic
Quoting Clay Teeter <[email protected]>: > It sounds like the second option (partition state on jobid or ...) would be > a great general solution. Would people here be interested in a patch for > this? > > Cheers > Clay > > On Wed, May 30, 2012 at 1:03 PM, Moe Jette <[email protected]> wrote: > >> >> Oddly enough, I ran across this problem just yesterday on an old >> CentOS distro. >> No great solutions, but here are some options: >> * Upgrade the OS >> * Modify SLURM to spread out the job directories into subdirectories, >> say using a subdirectory based upon the last digit of the job ID. This >> applies to code in only a couple of places, so it should be pretty >> simple (search for "/environment" in src/slurmctld/job_mgr.c) >> * Configure MaxJobs=32000 in slurm.conf and force users reduce the load >> * The directories are created only for batch jobs, so if you can run >> interactive jobs (srun/salloc) this limit would not apply >> >> >> Quoting Clay Teeter <[email protected]>: >> >> > Thanks for the quick response! Given that our system is ext3 using a 2.6 >> > kernel, is there anything that we can do to configure slurm not to create >> > 32K directories/jobs in /var/slurm/state/? >> > >> > Cheers, >> > Clay >> > >> > On Wed, May 30, 2012 at 10:56 AM, Moe Jette <[email protected]> wrote: >> > >> >> >> >> See: >> >> http://superuser.com/questions/298420/cannot-mkdir-too-many-links >> >> >> >> With Ubuntu 12.4 (Linux 3.2.0-24) the limit is at least 200k rather than >> >> 32k. >> >> >> >> Quoting Clay Teeter <[email protected]>: >> >> >> >> > Hi Group, >> >> > >> >> > Anyone know how I might troubleshoot this error message? >> >> > >> >> > [2012-05-15T19:34:27] _slurm_rpc_submit_batch_job: I/O error writing >> >> > script/environment to file >> >> > [2012-05-15T19:34:28] error: mkdir(/var/slurm/state/job.3258740) error >> >> Too >> >> > many links >> >> > >> >> > Cheers, >> >> > Clay >> >> > >> >> >> >> >> > >> >> >
