It sounds like the second option (partition state on jobid or ...) would be a great general solution. Would people here be interested in a patch for this?
Cheers Clay On Wed, May 30, 2012 at 1:03 PM, Moe Jette <[email protected]> wrote: > > Oddly enough, I ran across this problem just yesterday on an old > CentOS distro. > No great solutions, but here are some options: > * Upgrade the OS > * Modify SLURM to spread out the job directories into subdirectories, > say using a subdirectory based upon the last digit of the job ID. This > applies to code in only a couple of places, so it should be pretty > simple (search for "/environment" in src/slurmctld/job_mgr.c) > * Configure MaxJobs=32000 in slurm.conf and force users reduce the load > * The directories are created only for batch jobs, so if you can run > interactive jobs (srun/salloc) this limit would not apply > > > Quoting Clay Teeter <[email protected]>: > > > Thanks for the quick response! Given that our system is ext3 using a 2.6 > > kernel, is there anything that we can do to configure slurm not to create > > 32K directories/jobs in /var/slurm/state/? > > > > Cheers, > > Clay > > > > On Wed, May 30, 2012 at 10:56 AM, Moe Jette <[email protected]> wrote: > > > >> > >> See: > >> http://superuser.com/questions/298420/cannot-mkdir-too-many-links > >> > >> With Ubuntu 12.4 (Linux 3.2.0-24) the limit is at least 200k rather than > >> 32k. > >> > >> Quoting Clay Teeter <[email protected]>: > >> > >> > Hi Group, > >> > > >> > Anyone know how I might troubleshoot this error message? > >> > > >> > [2012-05-15T19:34:27] _slurm_rpc_submit_batch_job: I/O error writing > >> > script/environment to file > >> > [2012-05-15T19:34:28] error: mkdir(/var/slurm/state/job.3258740) error > >> Too > >> > many links > >> > > >> > Cheers, > >> > Clay > >> > > >> > >> > > > >
