Sure, I'll volunteer.   Comments inline

On Wed, May 30, 2012 at 3:03 PM, Moe Jette <[email protected]> wrote:

>
> If you are volunteering, then sure.
> * Basing the subdirectory off the last digit or two of the job id
> should be easiest
>

For clarity, this is what is requested.
/45/xxxx45
/45/yyyyy45
/46/xxxx46
/46/yyyyy46


> * Code needs to be added to create these new directories either on
> demand or at slurmctld startup
> * I would suggest making the new logic conditional upon a SLURM
> build-time option
>

Why wouldn't this be a slurmd.conf option?  Seems easier and more flexible
then a build option.


> * Existing directories need to be moved or their jobs will be killed
> when slurmctld restarts using the new logic
>

On restart, job directories would be reconciled.


> Quoting Clay Teeter <[email protected]>:
>
> > It sounds like the second option (partition state on jobid or ...) would
> be
> > a great general solution.  Would people here be interested in a patch for
> > this?
> >
> > Cheers
> > Clay
> >
> > On Wed, May 30, 2012 at 1:03 PM, Moe Jette <[email protected]> wrote:
> >
> >>
> >> Oddly enough, I ran across this problem just yesterday on an old
> >> CentOS distro.
> >> No great solutions, but here are some options:
> >> * Upgrade the OS
> >> * Modify SLURM to spread out the job directories into subdirectories,
> >> say using a subdirectory based upon the last digit of the job ID. This
> >> applies to code in only a couple of places, so it should be pretty
> >> simple (search for "/environment" in src/slurmctld/job_mgr.c)
> >> * Configure MaxJobs=32000 in slurm.conf and force users reduce the load
> >> * The directories are created only for batch jobs, so if you can run
> >> interactive jobs (srun/salloc) this limit would not apply
> >>
> >>
> >> Quoting Clay Teeter <[email protected]>:
> >>
> >> > Thanks for the quick response!  Given that our system is ext3 using a
> 2.6
> >> > kernel, is there anything that we can do to configure slurm not to
> create
> >> > 32K directories/jobs in /var/slurm/state/?
> >> >
> >> > Cheers,
> >> > Clay
> >> >
> >> > On Wed, May 30, 2012 at 10:56 AM, Moe Jette <[email protected]>
> wrote:
> >> >
> >> >>
> >> >> See:
> >> >> http://superuser.com/questions/298420/cannot-mkdir-too-many-links
> >> >>
> >> >> With Ubuntu 12.4 (Linux 3.2.0-24) the limit is at least 200k rather
> than
> >> >> 32k.
> >> >>
> >> >> Quoting Clay Teeter <[email protected]>:
> >> >>
> >> >> > Hi Group,
> >> >> >
> >> >> > Anyone know how I might troubleshoot this error message?
> >> >> >
> >> >> > [2012-05-15T19:34:27] _slurm_rpc_submit_batch_job: I/O error
> writing
> >> >> > script/environment to file
> >> >> > [2012-05-15T19:34:28] error: mkdir(/var/slurm/state/job.3258740)
> error
> >> >> Too
> >> >> > many links
> >> >> >
> >> >> > Cheers,
> >> >> > Clay
> >> >> >
> >> >>
> >> >>
> >> >
> >>
> >>
> >
>
>

Reply via email to