2015-09-08 12:55 GMT+02:00 Raymond Wan <[email protected]>:

>
> Dear all,
>
> I'm trying to figure out how to configure a "cluster" with a single
> computer (i.e., execution and master node is the same).  After I
> figure this out, I hope that setting up a cluster with multiple nodes
> is not too difficult.
>
> In particular, I think the default setting permits only a single job
> per node at a time.  However, I'd like to set things up so that more
> than one job can run at a time.
>
> I'm looking at the CPU Management User and Administrator Guide [1],
> and in particular, the Consumable Resources in Slurm page [2].  I hope
> I'm on the right track?
>
> In the examples, I understand the memory (CR_Memory) example.  But, I
> don't quite understand the CR_CPU_Memory example.  What is -N and -n?
> The manpages says -N is the number of nodes...so with only one node,
> that is meaningless in my case.  -n is "number of tasks".  Is "number
> of tasks" the same as "number of CPUs"?
>
> Is there a reason why the example used both -N and -n and not just -n?
>  Do the two parameters interact somehow?
>
> If I have a computer with 2 cores and 10 threads each, that is 20
> CPUs.  So, -n can range from 1 to 20?
>
> And under SelectTypeParameters, if I set CR_CPU_Memory, then a job
> enters the running state if both CPU and Memory is available.
>
> So far, I hope I'm correct?  If so, then my "real" question is that
> the jobs I would like to run are mainly I/O intensive.  CPU and Memory
> usage is important, but the bottleneck is probably disk I/O.  If I've
> set up k disk partitions using object store, I'd like no more than k
> jobs to run at a time and I'd like each one to write to a different
> partition.
>
> I *think* this is "impossible" to do since it would be hard to force
> users to write to one partition and not any others.  But, I thought
> I'd ask anyway in case there is something within SLURM that I've
> missed.  Any suggestions?
>

You can use something like this:
 https://github.com/fafik23/slurm_plugins/blob/master/unshare/unshare.c
Using unshare syscall/linux namespaces, and unmount specified filesystems.
You can use licenses to achieve a kind of limiting number of jobs that are
using specified mountpoint, but... thats not real IOPS threshold. Currently
I don't how any linux mechanism that allows limitting process to specified
number of I/O operations per second. At our side we've been considering
writing our own fusefs with this functionality.

If you are using local disks,  gres may fit better than licenses...

cheers,
marcin

Reply via email to